How to Build Differential Privacy Testing for LLM Training Data

Why You Need to Test Privacy, Not Just Enable It

Slapping PrivacyEngine on your training loop and calling it a day is like adding a lock to your door without checking if it actually latches. Differential privacy (DP) gives you a mathematical guarantee – but only if the implementation is correct, the epsilon budget is reasonable, and the noise calibration actually prevents data leakage. You need tests that verify these properties.

Differential privacy works on two parameters: epsilon and delta. Epsilon measures how much a single training example can influence the model’s output. Lower epsilon means stronger privacy. A model trained with epsilon=1 leaks far less about individual records than one trained with epsilon=50. Delta is the probability that the epsilon guarantee fails entirely – you typically set it to something like 1/N where N is the dataset size.

The catch: you can technically “use DP” with an epsilon of 10,000, which gives you zero meaningful privacy. That’s why testing matters. You need to verify that your final epsilon is actually low enough, and that membership inference attacks fail against your model.

Train a Model with Opacus DP-SGD

First, install what you need:

1
pip install opacus==1.5.2 torch==2.2.0 torchvision scikit-learn numpy

Here’s a complete training script that applies differential privacy to a text classifier. This uses a simple model on synthetic data to keep it runnable, but the DP mechanics are identical to what you’d use on real LLM training data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset, random_split
from opacus import PrivacyEngine
from opacus.validators import ModuleValidator
import numpy as np

# Simulate a text embedding dataset (1000 samples, 128-dim embeddings, binary labels)
torch.manual_seed(42)
np.random.seed(42)
num_samples = 1000
embed_dim = 128

X = torch.randn(num_samples, embed_dim)
y = (X[:, 0] + X[:, 1] > 0).long()  # label depends on first two features

dataset = TensorDataset(X, y)
train_size = 800
test_size = 200
train_dataset, test_dataset = random_split(dataset, [train_size, test_size])

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64)

# Simple classifier -- use GroupNorm, not BatchNorm (BatchNorm breaks DP)
class TextClassifier(nn.Module):
    def __init__(self, input_dim=128, hidden_dim=64):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.GroupNorm(4, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.GroupNorm(4, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 2),
        )

    def forward(self, x):
        return self.net(x)

def train_with_dp(target_epsilon=3.0, target_delta=1e-5, epochs=10, max_grad_norm=1.0):
    """Train a model with differential privacy and return it along with the final epsilon."""
    model = TextClassifier()
    model = ModuleValidator.fix(model)  # auto-fix any DP-incompatible layers
    ModuleValidator.validate(model, strict=True)

    optimizer = optim.Adam(model.parameters(), lr=1e-3)
    criterion = nn.CrossEntropyLoss()

    privacy_engine = PrivacyEngine()
    model, optimizer, train_loader_dp = privacy_engine.make_private_with_epsilon(
        module=model,
        optimizer=optimizer,
        data_loader=train_loader,
        epochs=epochs,
        target_epsilon=target_epsilon,
        target_delta=target_delta,
        max_grad_norm=max_grad_norm,
    )

    for epoch in range(epochs):
        model.train()
        total_loss = 0.0
        for batch_x, batch_y in train_loader_dp:
            optimizer.zero_grad()
            output = model(batch_x)
            loss = criterion(output, batch_y)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()

        epsilon = privacy_engine.get_epsilon(delta=target_delta)
        print(f"Epoch {epoch+1}/{epochs} | Loss: {total_loss:.4f} | Epsilon: {epsilon:.2f}")

    final_epsilon = privacy_engine.get_epsilon(delta=target_delta)
    return model, final_epsilon

model, final_epsilon = train_with_dp(target_epsilon=3.0, target_delta=1e-5, epochs=10)
print(f"\nFinal privacy budget spent: epsilon={final_epsilon:.2f}")

Key things happening here: make_private_with_epsilon calibrates the noise level so you hit your target epsilon after the specified number of epochs. Each gradient gets clipped to max_grad_norm before noise is added. The privacy accountant inside Opacus tracks the cumulative epsilon spend across all training steps.

Build a Membership Inference Attack Test

The gold standard for testing privacy is a membership inference attack (MIA). The attacker’s goal: given a trained model and a data point, determine whether that point was in the training set. If your DP model resists this attack (attacker accuracy is close to 50% random chance), the privacy guarantee holds in practice.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from sklearn.metrics import roc_auc_score

def membership_inference_attack(model, train_dataset, test_dataset):
    """
    Run a loss-based membership inference attack.
    Returns attack accuracy and AUC. A well-privatized model should
    give attack AUC close to 0.5.
    """
    model.eval()
    criterion = nn.CrossEntropyLoss(reduction="none")

    def get_losses(dataset):
        loader = DataLoader(dataset, batch_size=64)
        losses = []
        with torch.no_grad():
            for batch_x, batch_y in loader:
                output = model(batch_x)
                batch_losses = criterion(output, batch_y)
                losses.extend(batch_losses.cpu().numpy())
        return np.array(losses)

    train_losses = get_losses(train_dataset)
    test_losses = get_losses(test_dataset)

    # Members have lower loss, non-members have higher loss
    # Labels: 1 = member (train), 0 = non-member (test)
    all_losses = np.concatenate([train_losses, test_losses])
    labels = np.concatenate([np.ones(len(train_losses)), np.zeros(len(test_losses))])

    # Attacker predicts "member" if loss is below median
    median_loss = np.median(all_losses)
    predictions = (all_losses < median_loss).astype(float)
    attack_accuracy = np.mean(predictions == labels)

    # AUC: use negative loss as score (lower loss = more likely member)
    attack_auc = roc_auc_score(labels, -all_losses)

    return attack_accuracy, attack_auc

attack_acc, attack_auc = membership_inference_attack(model, train_dataset, test_dataset)
print(f"MIA attack accuracy: {attack_acc:.3f}")
print(f"MIA attack AUC: {attack_auc:.3f}")
print(f"Privacy test {'PASSED' if attack_auc < 0.55 else 'FAILED'}: AUC should be < 0.55 for good DP")

A non-private model will often show attack AUC around 0.6-0.8, meaning the attacker can distinguish members from non-members with significant confidence. With DP training at epsilon=3.0, you should see AUC drop to 0.50-0.55 range.

A Complete DP Test Suite

Wrap everything into proper tests you can run in CI. This validates three things: the epsilon budget stays within bounds, the membership inference attack fails, and the model still has reasonable accuracy (privacy shouldn’t completely destroy utility).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import unittest

class TestDifferentialPrivacy(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        """Train the DP model once for all tests."""
        cls.target_epsilon = 3.0
        cls.target_delta = 1e-5
        cls.model, cls.final_epsilon = train_with_dp(
            target_epsilon=cls.target_epsilon,
            target_delta=cls.target_delta,
            epochs=10,
            max_grad_norm=1.0,
        )

    def test_epsilon_within_budget(self):
        """Final epsilon must not exceed our target budget."""
        self.assertLessEqual(
            self.final_epsilon,
            self.target_epsilon * 1.1,  # allow 10% tolerance from accountant rounding
            f"Epsilon {self.final_epsilon:.2f} exceeded budget {self.target_epsilon}",
        )

    def test_epsilon_is_meaningful(self):
        """Epsilon must be low enough to provide real privacy."""
        self.assertLess(
            self.final_epsilon,
            10.0,
            f"Epsilon {self.final_epsilon:.2f} is too high for meaningful privacy",
        )

    def test_membership_inference_resistance(self):
        """Membership inference AUC should be close to random (0.5)."""
        _, attack_auc = membership_inference_attack(
            self.model, train_dataset, test_dataset
        )
        self.assertLess(
            attack_auc,
            0.60,
            f"MIA AUC {attack_auc:.3f} is too high -- model leaks membership info",
        )

    def test_model_utility_preserved(self):
        """Model should still achieve reasonable accuracy despite DP noise."""
        self.model.eval()
        correct = 0
        total = 0
        with torch.no_grad():
            for batch_x, batch_y in test_loader:
                output = self.model(batch_x)
                preds = output.argmax(dim=1)
                correct += (preds == batch_y).sum().item()
                total += batch_y.size(0)
        accuracy = correct / total
        self.assertGreater(
            accuracy,
            0.55,  # should beat random chance (0.5) even with DP
            f"Accuracy {accuracy:.3f} is too low -- DP noise may be too aggressive",
        )

if __name__ == "__main__":
    unittest.main()

Run it with python -m pytest test_dp.py -v or python test_dp.py. You get a clear pass/fail for each privacy property.

Measuring and Interpreting Epsilon

Epsilon values mean different things depending on your threat model:

epsilon < 1: Very strong privacy. Noticeable accuracy drop. Good for medical or financial data.
epsilon 1-5: Solid privacy for most applications. The sweet spot for production models.
epsilon 5-10: Moderate privacy. Acceptable if your data isn’t extremely sensitive.
epsilon > 10: Weak privacy. Probably not worth the accuracy cost.

You can query the epsilon at any point during training to track how your budget accumulates:

1
2
3
4
5
6
# After each epoch, check running epsilon
for epoch in range(epochs):
    # ... training step ...
    current_epsilon = privacy_engine.get_epsilon(delta=1e-5)
    if current_epsilon > target_epsilon * 0.8:
        print(f"Warning: approaching budget limit at epsilon={current_epsilon:.2f}")

The privacy accountant in Opacus uses Renyi Differential Privacy (RDP) internally and converts to standard (epsilon, delta)-DP when you call get_epsilon. This is a tight bound – the actual privacy is at least as strong as the reported epsilon.

Common Errors and Fixes

ModuleValidator raises error about BatchNorm layers

BatchNorm computes statistics across the batch, which breaks per-sample gradient isolation. Replace all nn.BatchNorm layers with nn.GroupNorm or use ModuleValidator.fix(model) to auto-convert them.

Expected each sample to have the same number of elements error

This happens when your data loader returns variable-length sequences. Opacus needs uniform batch shapes for per-sample gradient computation. Pad your sequences to a fixed length before passing them to the data loader.

Epsilon explodes to hundreds or thousands

Your max_grad_norm is probably too low relative to your actual gradient magnitudes, or your batch size is too small. Larger batches give better privacy-utility tradeoffs because the noise is amortized over more samples. Try increasing batch size to 256 or 512.

Training is extremely slow with Opacus

Per-sample gradient computation is expensive. Opacus uses gradient hooks that expand memory usage by roughly batch_size times. Reduce batch size if you run out of GPU memory, but keep it as large as you can for better epsilon accounting.

get_epsilon returns lower epsilon than expected

This is not a bug. Opacus uses tight RDP-based accounting, which gives a better bound than naive composition. The actual privacy is at least as strong as reported.

MIA test passes but you set epsilon very high

A passing MIA test with high epsilon doesn’t mean your model is private. Loss-based MIA is a weak attack. Stronger attacks (shadow model training, likelihood ratio attacks) might still succeed. The epsilon bound is your real guarantee – don’t rely solely on empirical MIA tests.

Why You Need to Test Privacy, Not Just Enable It#

Train a Model with Opacus DP-SGD#

Build a Membership Inference Attack Test#

A Complete DP Test Suite#

Measuring and Interpreting Epsilon#

Common Errors and Fixes#

Related Guides#

About the Author