Most teams don’t need MLflow or a managed experiment tracker on day one. A single SQLite file and a FastAPI server give you a model metadata store that handles hundreds of experiments before you ever hit a scaling wall. You get full control, zero infrastructure cost, and an API you can query from any training script or CI pipeline.
Here’s the minimal setup to get a working metadata store running:
1
| pip install fastapi uvicorn aiosqlite
|
SQLite Schema for Models and Experiments#
The schema needs four tables: models, experiments, metrics, and deployments. Each experiment belongs to a model, each metric row belongs to an experiment, and deployments track which experiment is live.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
| # schema.sql
SCHEMA = """
CREATE TABLE IF NOT EXISTS models (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS experiments (
id INTEGER PRIMARY KEY AUTOINCREMENT,
model_id INTEGER NOT NULL,
run_name TEXT NOT NULL,
parameters TEXT NOT NULL, -- JSON string
status TEXT DEFAULT 'running',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
finished_at TIMESTAMP,
FOREIGN KEY (model_id) REFERENCES models(id)
);
CREATE TABLE IF NOT EXISTS metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
experiment_id INTEGER NOT NULL,
metric_name TEXT NOT NULL,
metric_value REAL NOT NULL,
step INTEGER,
logged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (experiment_id) REFERENCES experiments(id)
);
CREATE TABLE IF NOT EXISTS deployments (
id INTEGER PRIMARY KEY AUTOINCREMENT,
model_id INTEGER NOT NULL,
experiment_id INTEGER NOT NULL,
environment TEXT NOT NULL, -- 'staging' or 'production'
deployed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
deployed_by TEXT,
FOREIGN KEY (model_id) REFERENCES models(id),
FOREIGN KEY (experiment_id) REFERENCES experiments(id)
);
"""
|
Storing parameters as a JSON string keeps the schema flat. You can always extract values with json_extract() in SQLite 3.38+ if you need to filter by hyperparameter directly.
FastAPI Server with Lifespan DB Initialization#
FastAPI’s lifespan context manager is the correct way to handle startup and shutdown logic. The old @app.on_event("startup") decorator is deprecated. Here’s the full server with database initialization baked into the lifespan:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
| # main.py
import json
import aiosqlite
from contextlib import asynccontextmanager
from datetime import datetime, timezone
from typing import Optional
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
DATABASE = "metadata.db"
SCHEMA = """
CREATE TABLE IF NOT EXISTS models (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT UNIQUE NOT NULL,
description TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS experiments (
id INTEGER PRIMARY KEY AUTOINCREMENT,
model_id INTEGER NOT NULL,
run_name TEXT NOT NULL,
parameters TEXT NOT NULL,
status TEXT DEFAULT 'running',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
finished_at TIMESTAMP,
FOREIGN KEY (model_id) REFERENCES models(id)
);
CREATE TABLE IF NOT EXISTS metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
experiment_id INTEGER NOT NULL,
metric_name TEXT NOT NULL,
metric_value REAL NOT NULL,
step INTEGER,
logged_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (experiment_id) REFERENCES experiments(id)
);
CREATE TABLE IF NOT EXISTS deployments (
id INTEGER PRIMARY KEY AUTOINCREMENT,
model_id INTEGER NOT NULL,
experiment_id INTEGER NOT NULL,
environment TEXT NOT NULL,
deployed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
deployed_by TEXT,
FOREIGN KEY (model_id) REFERENCES models(id),
FOREIGN KEY (experiment_id) REFERENCES experiments(id)
);
"""
@asynccontextmanager
async def lifespan(app: FastAPI):
async with aiosqlite.connect(DATABASE) as db:
await db.executescript(SCHEMA)
await db.commit()
yield
app = FastAPI(title="Model Metadata Store", lifespan=lifespan)
async def get_db():
async with aiosqlite.connect(DATABASE) as db:
db.row_factory = aiosqlite.Row
yield db
|
The lifespan function runs the schema creation once on startup. Every request handler opens its own short-lived connection through get_db(). For a single-server setup with SQLite, this is the right pattern – you don’t want a persistent connection pool fighting over SQLite’s write lock.
Registering Models and Logging Experiments#
Now add the Pydantic models and endpoints for creating models, starting experiments, and logging metrics.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
| # Add these to main.py after the lifespan setup
class ModelCreate(BaseModel):
name: str
description: Optional[str] = None
class ExperimentCreate(BaseModel):
model_name: str
run_name: str
parameters: dict
class MetricLog(BaseModel):
metric_name: str
metric_value: float
step: Optional[int] = None
@app.post("/models")
async def register_model(body: ModelCreate):
async with aiosqlite.connect(DATABASE) as db:
try:
cursor = await db.execute(
"INSERT INTO models (name, description) VALUES (?, ?)",
(body.name, body.description),
)
await db.commit()
return {"id": cursor.lastrowid, "name": body.name}
except aiosqlite.IntegrityError:
raise HTTPException(status_code=409, detail=f"Model '{body.name}' already exists")
@app.post("/experiments")
async def create_experiment(body: ExperimentCreate):
async with aiosqlite.connect(DATABASE) as db:
db.row_factory = aiosqlite.Row
cursor = await db.execute(
"SELECT id FROM models WHERE name = ?", (body.model_name,)
)
model = await cursor.fetchone()
if not model:
raise HTTPException(status_code=404, detail=f"Model '{body.model_name}' not found")
cursor = await db.execute(
"INSERT INTO experiments (model_id, run_name, parameters) VALUES (?, ?, ?)",
(model["id"], body.run_name, json.dumps(body.parameters)),
)
await db.commit()
return {"experiment_id": cursor.lastrowid, "run_name": body.run_name}
@app.post("/experiments/{experiment_id}/metrics")
async def log_metric(experiment_id: int, body: MetricLog):
async with aiosqlite.connect(DATABASE) as db:
cursor = await db.execute(
"SELECT id FROM experiments WHERE id = ?", (experiment_id,)
)
if not await cursor.fetchone():
raise HTTPException(status_code=404, detail="Experiment not found")
await db.execute(
"INSERT INTO metrics (experiment_id, metric_name, metric_value, step) VALUES (?, ?, ?, ?)",
(experiment_id, body.metric_name, body.metric_value, body.step),
)
await db.commit()
return {"status": "logged", "experiment_id": experiment_id}
@app.patch("/experiments/{experiment_id}/finish")
async def finish_experiment(experiment_id: int):
async with aiosqlite.connect(DATABASE) as db:
now = datetime.now(timezone.utc).isoformat()
result = await db.execute(
"UPDATE experiments SET status = 'completed', finished_at = ? WHERE id = ?",
(now, experiment_id),
)
await db.commit()
if result.rowcount == 0:
raise HTTPException(status_code=404, detail="Experiment not found")
return {"status": "completed", "experiment_id": experiment_id}
|
Start the server and test it:
1
| uvicorn main:app --reload --port 8000
|
Register a model and kick off an experiment:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| # Register a model
curl -X POST http://localhost:8000/models \
-H "Content-Type: application/json" \
-d '{"name": "fraud-detector-v2", "description": "XGBoost fraud classifier"}'
# Start an experiment
curl -X POST http://localhost:8000/experiments \
-H "Content-Type: application/json" \
-d '{"model_name": "fraud-detector-v2", "run_name": "run-001", "parameters": {"learning_rate": 0.01, "max_depth": 6, "n_estimators": 500}}'
# Log metrics
curl -X POST http://localhost:8000/experiments/1/metrics \
-H "Content-Type: application/json" \
-d '{"metric_name": "auc_roc", "metric_value": 0.943, "step": 100}'
# Mark experiment complete
curl -X PATCH http://localhost:8000/experiments/1/finish
|
Querying and Comparing Experiments#
The real value shows up when you can compare runs side by side. Add endpoints that return experiments with their best metrics and let you rank runs by a specific metric.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
| # Add these to main.py
@app.get("/models/{model_name}/experiments")
async def list_experiments(model_name: str):
async with aiosqlite.connect(DATABASE) as db:
db.row_factory = aiosqlite.Row
cursor = await db.execute(
"SELECT id FROM models WHERE name = ?", (model_name,)
)
model = await cursor.fetchone()
if not model:
raise HTTPException(status_code=404, detail="Model not found")
cursor = await db.execute(
"""
SELECT e.id, e.run_name, e.parameters, e.status, e.created_at, e.finished_at
FROM experiments e
WHERE e.model_id = ?
ORDER BY e.created_at DESC
""",
(model["id"],),
)
rows = await cursor.fetchall()
experiments = []
for row in rows:
exp = dict(row)
exp["parameters"] = json.loads(exp["parameters"])
metric_cursor = await db.execute(
"""
SELECT metric_name, metric_value, step
FROM metrics
WHERE experiment_id = ?
ORDER BY step DESC
""",
(row["id"],),
)
exp["metrics"] = [dict(m) for m in await metric_cursor.fetchall()]
experiments.append(exp)
return {"model": model_name, "experiments": experiments}
@app.get("/models/{model_name}/leaderboard")
async def leaderboard(model_name: str, metric: str = "auc_roc", top_n: int = 10):
async with aiosqlite.connect(DATABASE) as db:
db.row_factory = aiosqlite.Row
cursor = await db.execute(
"SELECT id FROM models WHERE name = ?", (model_name,)
)
model = await cursor.fetchone()
if not model:
raise HTTPException(status_code=404, detail="Model not found")
cursor = await db.execute(
"""
SELECT e.id AS experiment_id, e.run_name, e.parameters,
MAX(m.metric_value) AS best_value
FROM experiments e
JOIN metrics m ON m.experiment_id = e.id
WHERE e.model_id = ? AND m.metric_name = ?
GROUP BY e.id
ORDER BY best_value DESC
LIMIT ?
""",
(model["id"], metric, top_n),
)
rows = await cursor.fetchall()
results = []
for row in rows:
entry = dict(row)
entry["parameters"] = json.loads(entry["parameters"])
results.append(entry)
return {"model": model_name, "metric": metric, "leaderboard": results}
|
Query the leaderboard after a few training runs:
1
2
3
4
5
| # Get all experiments for a model
curl http://localhost:8000/models/fraud-detector-v2/experiments
# Rank experiments by AUC-ROC
curl "http://localhost:8000/models/fraud-detector-v2/leaderboard?metric=auc_roc&top_n=5"
|
The leaderboard endpoint uses MAX(m.metric_value) to pick the best metric value across all logged steps. If you’re tracking loss instead of accuracy, switch to MIN(m.metric_value) and change the ORDER BY to ASC.
Common Errors and Fixes#
aiosqlite.OperationalError: database is locked
SQLite only allows one writer at a time. If you’re logging metrics from multiple training processes simultaneously, you’ll hit this. Set a busy timeout so connections wait instead of failing immediately:
1
2
3
| async with aiosqlite.connect(DATABASE, timeout=30) as db:
# Connection will wait up to 30 seconds for the write lock
await db.execute(...)
|
For high-concurrency writes (more than ~50 concurrent trainers), switch to PostgreSQL with asyncpg.
json.decoder.JSONDecodeError when reading parameters
This happens when you insert parameters as a Python dict object instead of a JSON string. Always serialize before insert:
1
2
3
4
5
6
7
| # Wrong - will store the Python repr string
await db.execute("INSERT INTO experiments ... VALUES (?, ?, ?)",
(model_id, run_name, parameters))
# Correct - serialize to JSON first
await db.execute("INSERT INTO experiments ... VALUES (?, ?, ?)",
(model_id, run_name, json.dumps(parameters)))
|
404 on experiment endpoints after server restart
The SQLite file persists, but if you accidentally point DATABASE to a different path or a :memory: database, you lose everything on restart. Always use a file path:
1
2
3
4
5
| # Bad - data vanishes when the process dies
DATABASE = ":memory:"
# Good - data persists across restarts
DATABASE = "metadata.db"
|
IntegrityError: UNIQUE constraint failed: models.name
This means you’re trying to register a model that already exists. The register_model endpoint catches this and returns a 409. If you want upsert behavior instead, use INSERT OR IGNORE:
1
2
3
4
| await db.execute(
"INSERT OR IGNORE INTO models (name, description) VALUES (?, ?)",
(name, description),
)
|
Missing row_factory gives tuple results instead of dicts
If your queries return tuples like (1, "run-001", ...) instead of dicts, you forgot to set row_factory:
1
2
3
4
5
| async with aiosqlite.connect(DATABASE) as db:
db.row_factory = aiosqlite.Row # This line is required
cursor = await db.execute("SELECT * FROM experiments")
row = await cursor.fetchone()
print(row["run_name"]) # Now works as a dict-like object
|
This metadata store covers the 80% case. You get experiment tracking, metric logging, and a leaderboard API in under 200 lines of Python. When you outgrow SQLite – typically around thousands of concurrent writes – swap the storage layer for PostgreSQL while keeping the same FastAPI endpoints.