Wrap Your Training Loop with PrivacyEngine
Opacus lets you train any PyTorch model with differential privacy by wrapping your existing optimizer, model, and data loader. The core idea behind DP-SGD is simple: clip each sample’s gradient to a fixed norm, average them, and add calibrated Gaussian noise before the parameter update. That noise is what gives you a mathematical privacy guarantee.
Install Opacus and its dependencies:
| |
Here’s a minimal working example that trains a CNN on MNIST with differential privacy:
| |
Three lines changed your standard PyTorch training into a differentially private one: instantiating PrivacyEngine, calling make_private, and querying get_epsilon to track your budget.
Understanding Epsilon, Delta, and the Privacy Budget
The privacy guarantee is expressed as (epsilon, delta)-differential privacy. Epsilon measures how much any single training example can influence the model’s output. Delta is the probability the guarantee fails entirely.
Practical guidelines for choosing these values:
- Delta should be less than 1/N where N is the training set size. For MNIST (60,000 samples), delta=1e-5 is standard.
- Epsilon under 1 means strong privacy but usually significant accuracy loss. Epsilon between 1-10 is a reasonable range for production models. Epsilon above 50 provides weak guarantees.
- Noise multiplier directly controls how much noise gets added per step. Higher values give better (lower) epsilon but hurt convergence.
You can also target a specific epsilon and let Opacus calculate the required noise multiplier:
| |
This is often the better approach – you pick your privacy target and Opacus figures out the noise.
Fixing Incompatible Layers
Not every PyTorch module works with Opacus. BatchNorm is the most common offender because it computes statistics across the batch, meaning one sample’s normalized value depends on other samples in the same batch. That violates the per-sample isolation that DP requires.
You’ll hit this error when you try to use a model with BatchNorm:
| |
Opacus provides ModuleValidator to detect and auto-fix these issues:
| |
ModuleValidator.fix() replaces BatchNorm with GroupNorm automatically. The critical mistake people make is calling fix() after creating the optimizer. Since fix() swaps out layers, the optimizer’s parameter references become stale. Always fix first, then create your optimizer.
Tuning for Better Accuracy
DP training will always sacrifice some accuracy compared to non-private training. On CIFAR-10, expect roughly 60-65% accuracy with reasonable privacy guarantees versus 76%+ without DP. That said, bad hyperparameters make the gap much worse than it needs to be.
The order of tuning importance:
max_grad_norm– Start with 1.0, then sweep [0.1, 0.5, 1.0, 1.5, 2.0]. Train without noise first (setnoise_multiplier=0.0) and watch what fraction of gradients get clipped. You want roughly 50-80% of gradients clipped. Too low and you’re destroying information; too high and the clipping does nothing useful.noise_multiplier– Lower means less noise and better accuracy but weaker privacy. Start at 0.1 (barely private) to sanity-check your pipeline works, then increase.Learning rate – DP training usually needs a lower learning rate than standard training because gradients are noisier. If you train non-privately at 1e-3, try 5e-4 or 1e-4 with DP.
Batch size – Larger batches help because the noise is fixed per batch, so more samples means better signal-to-noise ratio. But larger batches also consume more privacy budget per step, so train for fewer epochs.
Handling Memory Pressure
Per-sample gradient computation is the most expensive part of DP training. Opacus needs to store an individual gradient for every sample in the batch rather than just the averaged gradient. For large models, this blows up memory fast.
Use BatchMemoryManager to decouple the logical batch size (for privacy accounting) from the physical batch size (what fits in GPU memory):
| |
This accumulates gradients across multiple physical steps before applying the noisy update, giving you the privacy benefits of large batches without running out of VRAM.
Common Pitfalls
Reusing a spent privacy budget. Every call to optimizer.step() consumes privacy budget. If you save a checkpoint and resume training, you need to account for the epsilon already spent. Opacus tracks this via privacy_engine.accountant, but if you restart from scratch, that state is lost.
Forgetting to set delta. If you never call get_epsilon(delta=...), you’re not actually checking your privacy guarantee. Log epsilon every epoch and set a hard limit – stop training if epsilon exceeds your threshold.
Evaluating on training data. DP protects the training data, not the test data. Your test accuracy is still a valid metric and doesn’t consume any privacy budget. Only forward/backward passes on the training set count.
Using dropout with DP. Dropout works fine with Opacus, but be aware it interacts with the noise – the effective noise per active parameter increases when some are dropped. This isn’t a bug, but it changes the accuracy/privacy tradeoff in ways that are hard to predict analytically. Test empirically.
Related Guides
- How to Build Differential Privacy Testing for LLM Training Data
- How to Implement Federated Learning for Privacy-Preserving ML
- How to Build Adversarial Robustness Testing for Vision Models
- How to Detect and Mitigate Bias in ML Models
- How to Build Automated Fairness Testing for LLM-Generated Content
- How to Build Adversarial Test Suites for ML Models
- How to Detect AI-Generated Text with Watermarking
- How to Build Model Cards and Document AI Systems Responsibly
- How to Implement AI Audit Logging and Compliance Tracking
- How to Build Automated Age and Content Gating for AI Applications