If you can’t reproduce your results, you can’t verify them. And if you can’t verify them, you can’t trust them. Reproducibility isn’t just a nice-to-have for ML experiments — it’s the foundation of trustworthy AI. When an auditor asks “how did you get these numbers?” and you shrug because your training run gives different results every time, that’s a credibility problem. Worse, it’s a safety problem. Non-reproducible experiments make it impossible to audit AI systems, trace bugs back to their source, or confirm that a model behaves the way you claim it does.

The fix is straightforward: seed every source of randomness, lock down non-deterministic operations, and log everything.

Seed Everything

Random numbers come from multiple sources in a typical ML pipeline. Python’s built-in random module, NumPy, PyTorch’s CPU and CUDA generators — each has its own internal state. Miss one and your results drift between runs.

Here’s a seed_everything() function that covers all of them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import os
import random
import numpy as np
import torch


def seed_everything(seed: int = 42) -> None:
    """Set all random seeds for reproducibility."""
    # Python's built-in random
    random.seed(seed)

    # NumPy
    np.random.seed(seed)

    # PyTorch CPU
    torch.manual_seed(seed)

    # PyTorch CUDA (all GPUs)
    torch.cuda.manual_seed_all(seed)

    # Hash seed for Python string hashing (affects dict ordering in some cases)
    os.environ["PYTHONHASHSEED"] = str(seed)

    print(f"All seeds set to {seed}")

Call seed_everything(42) at the very top of your script, before you import models, create datasets, or initialize anything that touches random state.

The PYTHONHASHSEED environment variable controls Python’s hash randomization. It needs to be set before the interpreter starts for full effect, so either export it in your shell or set it in your launch script:

1
PYTHONHASHSEED=42 python train.py

One important note: torch.cuda.manual_seed_all() seeds every GPU. If you only call torch.cuda.manual_seed(), you only seed the current device. Always use the _all variant unless you have a specific reason not to.

Enable Deterministic Operations in PyTorch

Seeding random number generators isn’t enough. PyTorch uses optimized CUDA kernels that are non-deterministic by default because the fastest algorithm for a given operation often involves race conditions between GPU threads. Two runs with the same seed can still produce different results.

Lock this down with three settings:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def make_deterministic() -> None:
    """Force PyTorch to use deterministic algorithms."""
    # Master switch: raises error if a non-deterministic op is used
    torch.use_deterministic_algorithms(True)

    # Force cuDNN to use deterministic convolution algorithms
    torch.backends.cudnn.deterministic = True

    # Disable cuDNN auto-tuner (it benchmarks multiple algorithms and picks the fastest,
    # which can vary between runs)
    torch.backends.cudnn.benchmark = False

The performance trade-off is real. cudnn.benchmark = False means PyTorch won’t search for the fastest convolution algorithm for your specific input sizes. On some models, this can slow training by 10-20%. torch.use_deterministic_algorithms(True) is stricter — it forces every operation to use a deterministic implementation, and throws a RuntimeError if one doesn’t exist.

For debugging and auditing, the slowdown is worth it. For large-scale production training where you’ve already validated your pipeline, you might relax benchmark back to True and accept minor floating-point variance.

Put it all together at the start of your training script:

1
2
3
4
5
6
7
8
seed_everything(42)
make_deterministic()

# Now build your model, dataset, etc.
model = torch.nn.TransformerEncoder(
    torch.nn.TransformerEncoderLayer(d_model=512, nhead=8, batch_first=True),
    num_layers=6,
)

Handle Non-Deterministic Operations

Some operations simply don’t have deterministic CUDA implementations. When you enable torch.use_deterministic_algorithms(True), these will throw errors instead of silently giving different results. That’s the point — you want to know.

Operations that are non-deterministic on CUDA:

  • torch.Tensor.scatter_add_() — used in GNN message passing and embedding lookups
  • torch.Tensor.put_() with accumulate=True
  • torch.bincount() on CUDA
  • torch.nn.functional.interpolate() with certain modes
  • Any operation that uses atomicAdd internally (reductions, index operations)

For DataLoader, multi-process data loading introduces randomness because worker processes have independent random states. Fix this with a worker_init_fn:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def worker_init_fn(worker_id: int) -> None:
    """Ensure each DataLoader worker gets a deterministic seed."""
    worker_seed = torch.initial_seed() % 2**32
    np.random.seed(worker_seed)
    random.seed(worker_seed)


train_loader = torch.utils.data.DataLoader(
    dataset,
    batch_size=32,
    shuffle=True,
    num_workers=4,
    worker_init_fn=worker_init_fn,
    generator=torch.Generator().manual_seed(42),
)

The generator argument seeds the shuffling. The worker_init_fn seeds each worker’s NumPy and Python random state based on the PyTorch seed. Without both, your data order varies between runs even with a global seed set.

If you hit a non-deterministic operation you can’t avoid, you have two options. First, try running on CPU for that specific operation — CPU implementations are usually deterministic. Second, set the environment variable CUBLAS_WORKSPACE_CONFIG to force deterministic cuBLAS behavior:

1
export CUBLAS_WORKSPACE_CONFIG=":4096:8"

This is required for deterministic behavior in some matrix multiplication and convolution operations on CUDA 10.2+.

Log and Track Seeds

Setting a seed is useless if you don’t record which seed you used. Build a simple tracker that logs the seed alongside your experiment metadata:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import json
import subprocess
from datetime import datetime, timezone
from pathlib import Path


def get_git_hash() -> str:
    """Get current git commit hash."""
    try:
        result = subprocess.run(
            ["git", "rev-parse", "HEAD"],
            capture_output=True,
            text=True,
            check=True,
        )
        return result.stdout.strip()
    except (subprocess.CalledProcessError, FileNotFoundError):
        return "unknown"


def log_experiment(
    seed: int,
    experiment_name: str,
    log_dir: str = "experiment_logs",
    extra_config: dict | None = None,
) -> Path:
    """Log experiment seed, git hash, and environment info."""
    log_path = Path(log_dir)
    log_path.mkdir(parents=True, exist_ok=True)

    timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
    log_file = log_path / f"{experiment_name}_{timestamp}.json"

    record = {
        "experiment_name": experiment_name,
        "seed": seed,
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "git_hash": get_git_hash(),
        "python_version": __import__("sys").version,
        "pytorch_version": torch.__version__,
        "cuda_available": torch.cuda.is_available(),
        "cuda_version": torch.version.cuda if torch.cuda.is_available() else None,
        "gpu_name": (
            torch.cuda.get_device_name(0) if torch.cuda.is_available() else None
        ),
    }

    if extra_config:
        record["config"] = extra_config

    log_file.write_text(json.dumps(record, indent=2))
    print(f"Experiment logged to {log_file}")
    return log_file

Use it at the start of every run:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
SEED = 42
seed_everything(SEED)
make_deterministic()

log_experiment(
    seed=SEED,
    experiment_name="transformer_baseline",
    extra_config={
        "batch_size": 32,
        "learning_rate": 3e-4,
        "epochs": 10,
        "model": "TransformerEncoder-6L-512d",
    },
)

This gives you a JSON file for every experiment with the exact seed, commit hash, PyTorch version, and GPU info. When results don’t match, diff the log files. Nine times out of ten, the problem is a library version change or a different GPU architecture — not the seed.

Common Errors and Fixes

RuntimeError: Deterministic behavior was enabled with either torch.use_deterministic_algorithms(True) but a non-deterministic operation was called.

This happens when you call an operation that has no deterministic CUDA kernel. The full error message tells you which operation caused it. Common culprits are scatter_add_, index_add_, and ctc_loss.

Fix: Set CUBLAS_WORKSPACE_CONFIG and check if a CPU fallback is feasible for that specific operation:

1
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"

If you absolutely need the non-deterministic op on GPU and can accept it, use torch.use_deterministic_algorithms(True, warn_only=True) to get warnings instead of errors.

Results differ between GPU architectures (e.g., A100 vs V100).

Floating-point operations aren’t guaranteed to produce identical results across different GPU architectures. The hardware-level implementation of fused multiply-add instructions varies. This isn’t a bug — it’s IEEE 754 floating-point arithmetic.

Fix: Pin your GPU architecture in experiment logs and compare results only across the same hardware. If you need cross-architecture reproducibility, run on CPU.

DataLoader returns different batches despite setting a seed.

Usually caused by missing worker_init_fn or not passing a seeded generator to the DataLoader. Also check that you’re not calling seed_everything() after the DataLoader is constructed.

Fix: Use the worker_init_fn and generator pattern shown in the DataLoader section above. Always seed before creating any data loaders or models.

Results differ when changing num_workers in DataLoader.

Each worker processes a different subset of data and applies transforms independently. Changing the number of workers changes which worker handles which sample, altering the random augmentations applied.

Fix: Keep num_workers constant between comparison runs. Log it as part of your experiment config. If you’re using random augmentations, make sure each worker’s seed is derived deterministically from the global seed and worker ID, which the worker_init_fn pattern handles.