How to Build a Model Dependency Scanner and Vulnerability Checker

ML model serving environments are dependency nightmares. You pull in PyTorch, Transformers, NumPy, Pillow, and suddenly you have 200+ transitive dependencies – any one of which could have a known CVE sitting in production. Most teams never check. Here’s how to build a scanner that catches these before they become incidents.

1
2
pip install pip-audit
pip-audit -r requirements.txt --format json --output audit-report.json

That’s the fastest path to scanning your environment. pip-audit checks every installed package against the Python Packaging Advisory Database (PyPA) and the OSV database. The JSON output gives you structured data to build automation around.

Scanning ML-Specific Packages

ML dependencies are particularly risky because they ship native code, bundle C++ libraries, and often lag behind security patches. A typical model serving requirements.txt looks like this:

1
2
3
4
5
6
7
torch==2.1.0
transformers==4.35.0
numpy==1.24.0
pillow==9.4.0
fastapi==0.104.0
uvicorn==0.24.0
pydantic==2.5.0

Run the scan against it:

1
pip-audit -r requirements.txt --desc on --format columns

The --desc on flag includes vulnerability descriptions so you can triage without looking up every CVE manually. You’ll often find issues in Pillow (image parsing bugs are a recurring theme), NumPy (buffer overflows in older versions), and occasionally in torch’s bundled libraries.

For a broader check that includes the safety database as well, install both tools:

1
2
pip install pip-audit safety
safety check --file requirements.txt --json --output safety-report.json

Using both gives you better coverage. pip-audit pulls from PyPA/OSV, while safety uses its own curated database. Some CVEs show up in one but not the other.

Building a Scan Report Script

A one-off scan is fine. But you want something that runs in CI, produces a clear pass/fail, and generates a report your team can act on. Here’s a Python script that wraps pip-audit, parses the JSON output, and generates a summary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import subprocess
import json
import sys
from datetime import datetime
from pathlib import Path


def run_pip_audit(requirements_path: str) -> list[dict]:
    result = subprocess.run(
        [
            "pip-audit",
            "-r", requirements_path,
            "--format", "json",
            "--desc", "on",
        ],
        capture_output=True,
        text=True,
    )
    # pip-audit exits non-zero when vulnerabilities are found
    # but still produces valid JSON on stdout
    if result.stdout.strip():
        report = json.loads(result.stdout)
        return report.get("dependencies", [])
    return []


def filter_vulnerable(dependencies: list[dict]) -> list[dict]:
    vulnerable = []
    for dep in dependencies:
        vulns = dep.get("vulns", [])
        if vulns:
            vulnerable.append({
                "name": dep["name"],
                "version": dep["version"],
                "vulns": [
                    {
                        "id": v["id"],
                        "fix_versions": v.get("fix_versions", []),
                        "description": v.get("description", "No description"),
                    }
                    for v in vulns
                ],
            })
    return vulnerable


def generate_report(vulnerable: list[dict], output_path: str) -> None:
    report = {
        "scan_date": datetime.utcnow().isoformat(),
        "total_vulnerable": len(vulnerable),
        "packages": vulnerable,
    }
    Path(output_path).write_text(json.dumps(report, indent=2))
    print(f"Report written to {output_path}")


def main():
    requirements_path = sys.argv[1] if len(sys.argv) > 1 else "requirements.txt"
    output_path = sys.argv[2] if len(sys.argv) > 2 else "vulnerability-report.json"

    print(f"Scanning {requirements_path}...")
    dependencies = run_pip_audit(requirements_path)
    vulnerable = filter_vulnerable(dependencies)

    if vulnerable:
        print(f"FAIL: Found {len(vulnerable)} vulnerable packages:")
        for pkg in vulnerable:
            cve_ids = ", ".join(v["id"] for v in pkg["vulns"])
            print(f"  - {pkg['name']}=={pkg['version']}: {cve_ids}")
        generate_report(vulnerable, output_path)
        sys.exit(1)
    else:
        print("PASS: No known vulnerabilities found.")
        generate_report([], output_path)
        sys.exit(0)


if __name__ == "__main__":
    main()

Run it locally:

1
python scan_dependencies.py requirements.txt vulnerability-report.json

The script exits with code 1 when vulnerabilities are found, which makes it plug directly into any CI system as a gate.

Integrating with GitHub Actions

The real value comes from running this on every PR and blocking merges when new vulnerabilities appear. Here’s a GitHub Actions workflow that does exactly that:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
name: ML Dependency Vulnerability Scan

on:
  pull_request:
    paths:
      - "requirements*.txt"
      - "setup.py"
      - "pyproject.toml"
  schedule:
    - cron: "0 6 * * 1"  # Weekly Monday scan

jobs:
  vulnerability-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install scanning tools
        run: pip install pip-audit safety

      - name: Run pip-audit
        run: |
          pip-audit -r requirements.txt \
            --format json \
            --output pip-audit-report.json \
            --desc on || true

      - name: Run safety check
        run: |
          safety check \
            --file requirements.txt \
            --json \
            --output safety-report.json || true

      - name: Run custom scanner
        run: python scan_dependencies.py requirements.txt vulnerability-report.json

      - name: Upload scan reports
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: vulnerability-reports
          path: |
            pip-audit-report.json
            safety-report.json
            vulnerability-report.json

The || true on the individual tool runs prevents them from short-circuiting the workflow. The custom scanner script handles the final pass/fail decision. The schedule trigger runs a weekly scan even when nobody is pushing changes – new CVEs get published against existing versions all the time.

Scanning Multiple Requirements Files

You probably have multiple requirements files: requirements.txt for serving, requirements-dev.txt for training, maybe requirements-test.txt for CI. Scan all of them:

1
2
3
4
5
6
      - name: Scan all requirements files
        run: |
          for req_file in requirements*.txt; do
            echo "--- Scanning $req_file ---"
            pip-audit -r "$req_file" --format columns --desc on || true
          done

Pinning to Safe Versions

When pip-audit finds a vulnerability, it tells you which versions contain the fix. Use pip-audit --fix to automatically update your requirements file to the nearest safe version:

1
pip-audit -r requirements.txt --fix --dry-run

Always run with --dry-run first. The automatic fix picks the closest safe version, but in ML land, jumping a major version of PyTorch or NumPy can break your model loading code. Review the suggested changes, test your model inference, then apply:

1
pip-audit -r requirements.txt --fix

For a more controlled approach, build a script that generates a pinned requirements file with comments explaining why each pin exists:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import json
from pathlib import Path


def generate_pinned_requirements(report_path: str, requirements_path: str) -> str:
    report = json.loads(Path(report_path).read_text())
    current_reqs = Path(requirements_path).read_text().splitlines()

    vuln_map = {}
    for pkg in report.get("packages", []):
        fix_versions = []
        for v in pkg["vulns"]:
            fix_versions.extend(v.get("fix_versions", []))
        if fix_versions:
            vuln_map[pkg["name"].lower()] = {
                "fix": fix_versions[0],
                "cves": [v["id"] for v in pkg["vulns"]],
            }

    pinned_lines = []
    for line in current_reqs:
        stripped = line.strip()
        if not stripped or stripped.startswith("#"):
            pinned_lines.append(line)
            continue

        pkg_name = stripped.split("==")[0].split(">=")[0].split("<=")[0].lower()
        if pkg_name in vuln_map:
            info = vuln_map[pkg_name]
            cve_str = ", ".join(info["cves"])
            pinned_lines.append(f"{pkg_name}=={info['fix']}  # pinned: fixes {cve_str}")
        else:
            pinned_lines.append(line)

    return "\n".join(pinned_lines) + "\n"


if __name__ == "__main__":
    output = generate_pinned_requirements(
        "vulnerability-report.json", "requirements.txt"
    )
    Path("requirements-pinned.txt").write_text(output)
    print("Wrote requirements-pinned.txt")

This gives you a clear audit trail. When someone asks “why is Pillow pinned to 10.2.0?”, the comment says exactly which CVEs drove the decision.

Common Errors and Fixes

`pip-audit` fails with “No matching distribution found”

This happens when your requirements file includes packages that aren’t installed in the current environment. Either install them first or scan the live environment instead:

1
2
pip install -r requirements.txt
pip-audit

Running pip-audit without -r scans the currently installed environment instead of a requirements file. This avoids the resolution problem entirely.

`safety` returns “Invalid API key”

The free tier of safety works without an API key for basic checks. If you see auth errors, make sure you’re not setting SAFETY_API_KEY to an expired token. Remove the env var and it falls back to the free database:

1
2
unset SAFETY_API_KEY
safety check --file requirements.txt

GPU-only packages fail to resolve

PyTorch with CUDA dependencies can’t resolve on a CPU-only CI runner. Use the CPU index URL in your CI requirements:

1
pip-audit -r requirements.txt --index-url https://download.pytorch.org/whl/cpu

Or maintain a separate requirements-ci.txt that uses CPU-only torch builds for scanning purposes.

Transitive dependency vulnerabilities

pip-audit catches transitive dependencies by default when scanning an installed environment. But when scanning a requirements file with -r, it only checks what’s listed. For full transitive coverage, install first, then scan:

1
2
pip install -r requirements.txt
pip-audit --format json --output full-audit.json

This catches vulnerabilities in packages you never explicitly listed but that got pulled in by torch or transformers.

False positives on yanked versions

Sometimes pip-audit flags a version as vulnerable because it was yanked from PyPI, not because of a CVE. Check the --ignore-vuln flag to suppress known false positives:

1
pip-audit -r requirements.txt --ignore-vuln PYSEC-2023-XXXX

Keep a .pip-audit-ignore file in your repo to track suppressed findings with justifications so your team knows why each was dismissed.

Scanning ML-Specific Packages#

Building a Scan Report Script#

Integrating with GitHub Actions#

Scanning Multiple Requirements Files#

Pinning to Safe Versions#

Common Errors and Fixes#

pip-audit fails with “No matching distribution found”#

safety returns “Invalid API key”#

GPU-only packages fail to resolve#

Transitive dependency vulnerabilities#

False positives on yanked versions#

Related Guides#

About the Author