How to Track ML Experiments with Weights and Biases

Install and Authenticate

1
2
pip install wandb
wandb login

The wandb login command opens a browser to grab your API key from wandb.ai/authorize. Paste it into the terminal prompt. Alternatively, set it as an environment variable so scripts on headless machines work without interactive login:

1
export WANDB_API_KEY="your-key-here"

If you hit the classic authentication error:

1
wandb: ERROR Invalid API key

Check that your key hasn’t been revoked. Go to Settings > API Keys in the W&B dashboard and generate a fresh one. A common cause is copying the key with trailing whitespace or newlines.

Log Your First Run

The core pattern is three lines: wandb.init() to start a run, wandb.log() to record metrics, and wandb.finish() to close it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import wandb
import random

wandb.init(
    project="my-first-project",
    config={
        "learning_rate": 0.001,
        "epochs": 10,
        "batch_size": 32,
        "architecture": "ResNet-50",
    },
)

for epoch in range(wandb.config.epochs):
    train_loss = random.uniform(0.1, 1.0) * (1 - epoch / 10)
    val_loss = train_loss + random.uniform(0.05, 0.15)
    wandb.log({
        "epoch": epoch,
        "train/loss": train_loss,
        "val/loss": val_loss,
        "train/accuracy": 1 - train_loss,
    })

wandb.finish()

Open your W&B dashboard and you’ll see the run with live charts for every metric you logged. The config dictionary is searchable across runs, which matters once you have hundreds of experiments to compare.

A few things worth noting. Use slashes in metric names like train/loss and val/loss to group them into panels automatically. Call wandb.log() at whatever frequency makes sense – every step, every epoch, or on specific events. And always call wandb.finish() at the end, especially in notebooks where the process doesn’t exit cleanly.

Integrate with PyTorch Training Loops

Here’s a real PyTorch training loop with wandb tracking:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import torch
import torch.nn as nn
import torch.optim as optim
import wandb

wandb.init(project="pytorch-demo", config={
    "lr": 1e-3,
    "epochs": 20,
    "optimizer": "Adam",
})

model = nn.Sequential(
    nn.Linear(784, 256),
    nn.ReLU(),
    nn.Linear(256, 10),
)

optimizer = optim.Adam(model.parameters(), lr=wandb.config.lr)
criterion = nn.CrossEntropyLoss()

# optional: watch gradients and model parameters
wandb.watch(model, criterion, log="all", log_freq=100)

for epoch in range(wandb.config.epochs):
    # ... your training loop here ...
    running_loss = 0.0
    correct = 0
    total = 0

    for batch_x, batch_y in train_loader:
        optimizer.zero_grad()
        outputs = model(batch_x)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        _, predicted = outputs.max(1)
        total += batch_y.size(0)
        correct += predicted.eq(batch_y).sum().item()

    wandb.log({
        "epoch": epoch,
        "train/loss": running_loss / len(train_loader),
        "train/accuracy": correct / total,
    })

wandb.finish()

wandb.watch() logs gradient histograms and parameter distributions. It’s great for debugging vanishing/exploding gradients but adds overhead, so skip it for fast iteration runs.

Use the Hugging Face Trainer Integration

If you’re fine-tuning with the Hugging Face Trainer, wandb works out of the box. Set report_to="wandb" and the WandbCallback handles everything:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import os
from transformers import TrainingArguments, Trainer

os.environ["WANDB_PROJECT"] = "hf-fine-tune"
os.environ["WANDB_LOG_MODEL"] = "end"  # upload final model as artifact

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    evaluation_strategy="epoch",
    report_to="wandb",
    run_name="bert-sentiment-v1",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

Set WANDB_LOG_MODEL="end" to automatically save the trained model as a W&B Artifact when training finishes. Use "checkpoint" instead to save at every save_steps interval. You can also set WANDB_WATCH="all" to log gradients, though it slows things down for large models.

One gotcha: if wandb is installed but you don’t want to use it for a specific run, set report_to="none" in TrainingArguments. Otherwise the Trainer auto-detects wandb and starts logging, which creates orphaned runs if your API key isn’t configured.

Run Hyperparameter Sweeps

Sweeps let you search hyperparameter space without writing custom loops. Define a sweep config, then let W&B coordinate the search:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import wandb

sweep_config = {
    "method": "bayes",  # or "grid", "random"
    "metric": {"name": "val/loss", "goal": "minimize"},
    "parameters": {
        "learning_rate": {
            "distribution": "log_uniform_values",
            "min": 1e-5,
            "max": 1e-2,
        },
        "batch_size": {"values": [16, 32, 64]},
        "dropout": {"min": 0.1, "max": 0.5},
        "optimizer": {"values": ["adam", "sgd", "adamw"]},
    },
}

sweep_id = wandb.sweep(sweep_config, project="my-sweep")


def train():
    wandb.init()
    config = wandb.config

    # build model with config.learning_rate, config.batch_size, etc.
    # ... training code ...

    for epoch in range(10):
        val_loss = train_one_epoch(config)
        wandb.log({"val/loss": val_loss, "epoch": epoch})

    wandb.finish()


# launch 20 runs
wandb.agent(sweep_id, function=train, count=20)

Bayesian search ("method": "bayes") is the best default – it uses previous results to pick the next set of hyperparameters. Grid search is only practical when you have a tiny search space. Random search is a solid baseline that parallelizes well across multiple machines.

To run a sweep agent on a second machine, grab the sweep_id from the dashboard and run:

1
wandb agent your-entity/my-sweep/abc123def

Each agent pulls configurations from the central sweep controller and reports results back.

Version Models and Data with Artifacts

Artifacts give you versioned storage for models, datasets, and any other files tied to a run:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import wandb

run = wandb.init(project="artifact-demo")

# log a model checkpoint
artifact = wandb.Artifact("resnet-50", type="model")
artifact.add_file("checkpoints/best_model.pth")
run.log_artifact(artifact)

# in a later run, pull it back
run2 = wandb.init(project="artifact-demo")
artifact = run2.use_artifact("resnet-50:latest")
artifact_dir = artifact.download()
# loads from ./artifacts/resnet-50:v3/best_model.pth

W&B checksums artifact contents automatically. If you log the same file twice without changes, it won’t create a new version or re-upload data. Use the :latest alias to always grab the most recent version, or pin to a specific version with :v2.

For datasets, same pattern:

1
2
3
artifact = wandb.Artifact("training-data", type="dataset")
artifact.add_dir("data/processed/")
run.log_artifact(artifact)

This creates a lineage graph in the W&B dashboard showing which datasets fed into which models, across which runs.

Troubleshooting Common Issues

wandb: ERROR Run exceeded rate limit – You’re calling wandb.log() too frequently. Batch your metrics into a single dictionary per step instead of calling log() multiple times. For fast inner loops, log every N steps rather than every step.

wandb: ERROR api_key not configured (no-tty) – Happens on remote servers and CI pipelines. Set WANDB_API_KEY as an environment variable. In GitHub Actions:

1
2
env:
  WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}

Runs showing as “crashed” – Your script exited before wandb.finish() was called. Use a context manager to guarantee cleanup:

1
2
3
4
with wandb.init(project="safe-run") as run:
    # training code here
    run.log({"loss": 0.5})
# finish() is called automatically

Large logs filling disk – W&B stores local logs in ~/.local/share/wandb/ (Linux) or ~/Library/Application Support/wandb/ (macOS). Clean old runs with wandb sync --clean.

Offline Mode

For air-gapped environments or when you just want to log locally:

1
export WANDB_MODE=offline

Runs are stored in a local wandb/ directory. Sync them later when you have network access:

1
wandb sync ./wandb/offline-run-20260214_103045-abc123

This is especially useful for training on clusters without internet access. All metrics, configs, and artifacts are preserved and uploaded as if the run happened online.

Install and Authenticate#

Log Your First Run#

Integrate with PyTorch Training Loops#

Use the Hugging Face Trainer Integration#

Run Hyperparameter Sweeps#

Version Models and Data with Artifacts#

Troubleshooting Common Issues#

Offline Mode#

Related Guides#

About the Author