Install and Authenticate
| |
The wandb login command opens a browser to grab your API key from wandb.ai/authorize. Paste it into the terminal prompt. Alternatively, set it as an environment variable so scripts on headless machines work without interactive login:
| |
If you hit the classic authentication error:
| |
Check that your key hasn’t been revoked. Go to Settings > API Keys in the W&B dashboard and generate a fresh one. A common cause is copying the key with trailing whitespace or newlines.
Log Your First Run
The core pattern is three lines: wandb.init() to start a run, wandb.log() to record metrics, and wandb.finish() to close it.
| |
Open your W&B dashboard and you’ll see the run with live charts for every metric you logged. The config dictionary is searchable across runs, which matters once you have hundreds of experiments to compare.
A few things worth noting. Use slashes in metric names like train/loss and val/loss to group them into panels automatically. Call wandb.log() at whatever frequency makes sense – every step, every epoch, or on specific events. And always call wandb.finish() at the end, especially in notebooks where the process doesn’t exit cleanly.
Integrate with PyTorch Training Loops
Here’s a real PyTorch training loop with wandb tracking:
| |
wandb.watch() logs gradient histograms and parameter distributions. It’s great for debugging vanishing/exploding gradients but adds overhead, so skip it for fast iteration runs.
Use the Hugging Face Trainer Integration
If you’re fine-tuning with the Hugging Face Trainer, wandb works out of the box. Set report_to="wandb" and the WandbCallback handles everything:
| |
Set WANDB_LOG_MODEL="end" to automatically save the trained model as a W&B Artifact when training finishes. Use "checkpoint" instead to save at every save_steps interval. You can also set WANDB_WATCH="all" to log gradients, though it slows things down for large models.
One gotcha: if wandb is installed but you don’t want to use it for a specific run, set report_to="none" in TrainingArguments. Otherwise the Trainer auto-detects wandb and starts logging, which creates orphaned runs if your API key isn’t configured.
Run Hyperparameter Sweeps
Sweeps let you search hyperparameter space without writing custom loops. Define a sweep config, then let W&B coordinate the search:
| |
Bayesian search ("method": "bayes") is the best default – it uses previous results to pick the next set of hyperparameters. Grid search is only practical when you have a tiny search space. Random search is a solid baseline that parallelizes well across multiple machines.
To run a sweep agent on a second machine, grab the sweep_id from the dashboard and run:
| |
Each agent pulls configurations from the central sweep controller and reports results back.
Version Models and Data with Artifacts
Artifacts give you versioned storage for models, datasets, and any other files tied to a run:
| |
W&B checksums artifact contents automatically. If you log the same file twice without changes, it won’t create a new version or re-upload data. Use the :latest alias to always grab the most recent version, or pin to a specific version with :v2.
For datasets, same pattern:
| |
This creates a lineage graph in the W&B dashboard showing which datasets fed into which models, across which runs.
Troubleshooting Common Issues
wandb: ERROR Run exceeded rate limit – You’re calling wandb.log() too frequently. Batch your metrics into a single dictionary per step instead of calling log() multiple times. For fast inner loops, log every N steps rather than every step.
wandb: ERROR api_key not configured (no-tty) – Happens on remote servers and CI pipelines. Set WANDB_API_KEY as an environment variable. In GitHub Actions:
| |
Runs showing as “crashed” – Your script exited before wandb.finish() was called. Use a context manager to guarantee cleanup:
| |
Large logs filling disk – W&B stores local logs in ~/.local/share/wandb/ (Linux) or ~/Library/Application Support/wandb/ (macOS). Clean old runs with wandb sync --clean.
Offline Mode
For air-gapped environments or when you just want to log locally:
| |
Runs are stored in a local wandb/ directory. Sync them later when you have network access:
| |
This is especially useful for training on clusters without internet access. All metrics, configs, and artifacts are preserved and uploaded as if the run happened online.
Related Guides
- How to Use the Weights and Biases Prompts API for LLM Tracing
- How to Build AI Apps with the Vercel AI SDK and Next.js
- How to Run Open-Source Models with the Replicate API
- How to Set Up a Local MCP Server to Give Your LLM Access to Custom Tools
- How to Use the Anthropic Multi-Turn Conversation API with Tool Use
- How to Use the Mistral API for Code Generation and Chat
- How to Build Interactive AI Demos with Gradio
- How to Use Claude’s Model Context Protocol (MCP)
- How to Run Fast LLM Inference with the Groq API
- How to Run Models with the Hugging Face Inference API