How to Set Up CI/CD for Machine Learning Models with GitHub Actions

The Core Workflow

ML CI/CD with GitHub Actions boils down to this: every push or pull request triggers a workflow that trains your model, evaluates it against a baseline, and posts results as a PR comment. If metrics pass your threshold, the model gets pushed to a registry.

Here’s the minimal workflow file that does all three. Create .github/workflows/ml-pipeline.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
name: ML Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  contents: write
  pull-requests: write

jobs:
  train-and-evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
          cache: pip

      - name: Install dependencies
        run: pip install -r requirements.txt

      - uses: iterative/setup-cml@v2

      - name: Train and evaluate
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          python train.py
          python evaluate.py > metrics.txt

          # Post results as PR comment
          echo "## Model Evaluation Results" >> report.md
          echo '```' >> report.md
          cat metrics.txt >> report.md
          echo '```' >> report.md

          cml comment create report.md

The permissions block is critical. Without pull-requests: write, you’ll hit this error immediately:

1
Error: Resource not accessible by integration

That error means the GITHUB_TOKEN doesn’t have permission to comment on PRs. The fix is always the same: add the permissions block at the top of your workflow, or go to Settings > Actions > General > Workflow permissions and switch to “Read and write.”

Setting Up Data Versioning with DVC

You don’t want to store training data or model artifacts in Git. DVC (Data Version Control) tracks them in remote storage while Git tracks the metadata. This way your CI pipeline can pull exactly the right dataset version for each commit.

Install DVC and initialize it in your repo:

1
2
3
pip install dvc dvc-s3
dvc init
dvc remote add -d myremote s3://my-ml-bucket/dvc-store

Track your dataset and model output:

1
2
3
4
dvc add data/training_set.csv
git add data/training_set.csv.dvc data/.gitignore
git commit -m "Track training data with DVC"
dvc push

Now update the workflow to pull DVC-tracked data before training:

1
2
3
4
5
6
7
      - name: Pull data with DVC
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: |
          pip install dvc dvc-s3
          dvc pull

Add your AWS credentials (or GCS/Azure equivalents) as repository secrets under Settings > Secrets and variables > Actions. DVC supports S3, GCS, Azure Blob, SSH, and HTTP remotes.

Automated Model Evaluation with CML

CML (Continuous Machine Learning) by Iterative is the tool that ties GitHub Actions to ML workflows. It posts training metrics, plots, and images directly to your PR comments so reviewers can see exactly what changed.

Here’s a training script that outputs metrics CML can pick up:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import json
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt
import matplotlib
matplotlib.use("Agg")
from sklearn.metrics import ConfusionMatrixDisplay

# Load and split
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Evaluate
preds = clf.predict(X_test)
acc = accuracy_score(y_test, preds)

# Write metrics for CI
with open("metrics.json", "w") as f:
    json.dump({"accuracy": acc}, f)

print(f"Accuracy: {acc:.4f}")
print(classification_report(y_test, preds))

# Save confusion matrix plot
ConfusionMatrixDisplay.from_predictions(y_test, preds)
plt.savefig("confusion_matrix.png")

Update your workflow to include the plot in the CML report:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
      - name: Generate CML report
        env:
          REPO_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          echo "## Training Report" >> report.md
          echo "### Metrics" >> report.md
          echo '```json' >> report.md
          cat metrics.json >> report.md
          echo '' >> report.md
          echo '```' >> report.md

          echo "### Confusion Matrix" >> report.md
          cml asset publish confusion_matrix.png --md >> report.md

          cml comment create report.md

The cml asset publish command uploads the image and returns a markdown image reference. Your PR comment now shows the confusion matrix directly in GitHub.

Model Registry Integration

Once your model passes evaluation, push it to a registry. MLflow’s model registry is the most common choice. Add a promotion step that only runs on the main branch:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
      - name: Register model
        if: github.ref == 'refs/heads/main'
        env:
          MLFLOW_TRACKING_URI: ${{ secrets.MLFLOW_TRACKING_URI }}
        run: |
          pip install mlflow
          python -c "
          import mlflow
          import json

          with open('metrics.json') as f:
              metrics = json.load(f)

          # Only register if accuracy meets threshold
          if metrics['accuracy'] >= 0.95:
              mlflow.set_experiment('iris-classifier')
              with mlflow.start_run():
                  mlflow.log_metrics(metrics)
                  mlflow.sklearn.log_model(
                      sk_model=None,  # Load your model here
                      artifact_path='model',
                      registered_model_name='iris-classifier'
                  )
              print(f'Model registered with accuracy {metrics[\"accuracy\"]}')
          else:
              print(f'Accuracy {metrics[\"accuracy\"]} below threshold, skipping registration')
              exit(1)
          "

The if: github.ref == 'refs/heads/main' guard prevents registration on PRs. You only want models promoted to the registry after they’ve been reviewed and merged.

Handling GPU Training in CI

GitHub-hosted runners don’t have GPUs. For GPU-intensive training, you have three options:

Self-hosted runners are the simplest. Install the GitHub runner agent on a GPU machine and tag your workflow:

1
2
3
jobs:
  train:
    runs-on: [self-hosted, gpu]

CML cloud provisioning spins up cloud instances on demand. Add this to your workflow:

1
2
3
4
5
6
7
8
9
      - name: Deploy GPU runner
        env:
          REPO_TOKEN: ${{ secrets.PERSONAL_ACCESS_TOKEN }}
        run: |
          cml runner launch \
            --cloud=aws \
            --cloud-region=us-east-1 \
            --cloud-type=g4dn.xlarge \
            --labels=cml-gpu

GitHub’s larger runners now include GPU options if you’re on a Team or Enterprise plan. Check runs-on: ubuntu-gpu availability in your organization settings.

Common Failures and Fixes

Out of disk space during training. GitHub-hosted runners have about 14 GB free. Large datasets or model checkpoints fill that up fast.

1
No space left on device

Fix it by cleaning up pre-installed tools before your training step:

1
2
3
4
5
6
      - name: Free disk space
        run: |
          sudo rm -rf /usr/share/dotnet
          sudo rm -rf /opt/ghc
          sudo rm -rf /usr/local/share/boost
          df -h

DVC pull fails with authentication errors. This usually means your secrets aren’t set or have the wrong scope.

1
ERROR: failed to pull data from the cloud - 403 Forbidden

Double-check that AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are set in repository secrets (not environment secrets, unless you’ve configured environments). For GCS, use a service account JSON stored as a secret.

Workflow hangs during model training. GitHub Actions has a 6-hour job timeout by default. Set something shorter so you don’t burn minutes:

1
2
3
4
jobs:
  train:
    runs-on: ubuntu-latest
    timeout-minutes: 30

Putting It All Together

A production-grade ML CI/CD pipeline combines all of these pieces: DVC for data versioning, CML for automated reporting, threshold-based evaluation gates, and model registry promotion on merge. The workflow runs on every PR, gives reviewers concrete metrics to evaluate, and automatically ships approved models to production.

Start with the basic train-and-evaluate workflow. Add DVC when your data gets too large for Git. Layer in CML reporting when you want metrics in PR comments. Add registry integration last, once you’ve established what “good enough” means for your model.

The Core Workflow#

Setting Up Data Versioning with DVC#

Automated Model Evaluation with CML#

Model Registry Integration#

Handling GPU Training in CI#

Common Failures and Fixes#

Putting It All Together#

Related Guides#

About the Author