If your AI application serves minors – or even might – you need content gating. Not a disclaimer page that asks “Are you 18?” and trusts the answer. Real gating that classifies every AI-generated response and blocks content that doesn’t match the user’s age tier.
Here’s a three-layer approach: regex keyword filtering (fast, catches the obvious stuff), a Hugging Face classifier (catches subtler content), and an LLM-based secondary check (handles nuance and context). All wired together with FastAPI middleware so every response gets screened automatically.
The fast version – a keyword filter you can drop in right now:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
| import re
from dataclasses import dataclass, field
from enum import Enum
class AgeRating(Enum):
ALL_AGES = 0 # Safe for everyone
TEEN = 1 # 13+
MATURE = 2 # 17+
ADULT = 3 # 18+
BLOCKED = 4 # Never serve this
AGE_RATING_LABELS = {
AgeRating.ALL_AGES: "all_ages",
AgeRating.TEEN: "teen",
AgeRating.MATURE: "mature",
AgeRating.ADULT: "adult",
AgeRating.BLOCKED: "blocked",
}
@dataclass
class GatingResult:
rating: AgeRating
flagged_terms: list[str] = field(default_factory=list)
confidence: float = 1.0
reason: str = ""
@property
def label(self) -> str:
return AGE_RATING_LABELS[self.rating]
# Patterns mapped to minimum age ratings
KEYWORD_RULES: list[tuple[str, AgeRating]] = [
(r"\b(kill|murder|suicide|self[- ]?harm)\b", AgeRating.MATURE),
(r"\b(porn|xxx|nsfw|hentai)\b", AgeRating.ADULT),
(r"\b(gore|torture|mutilation)\b", AgeRating.ADULT),
(r"\b(drug use|cocaine|heroin|meth)\b", AgeRating.MATURE),
(r"\b(damn|hell|crap)\b", AgeRating.TEEN),
(r"\b(fuck|shit|ass)\b", AgeRating.MATURE),
]
def keyword_gate(text: str) -> GatingResult:
"""Fast first-pass content rating using regex patterns."""
text_lower = text.lower()
highest_rating = AgeRating.ALL_AGES
flagged = []
for pattern, rating in KEYWORD_RULES:
matches = re.findall(pattern, text_lower)
if matches:
flagged.extend(matches)
if rating.value > highest_rating.value:
highest_rating = rating
return GatingResult(
rating=highest_rating,
flagged_terms=flagged,
confidence=0.7 if flagged else 0.5,
reason="keyword match" if flagged else "no keywords detected",
)
# Quick test
samples = [
"Here's how to make a paper airplane.",
"The character says 'damn, that was close' before escaping.",
"This tutorial covers explicit adult content.",
]
for sample in samples:
result = keyword_gate(sample)
print(f"[{result.label:>10}] {sample[:60]}")
# [ all_ages] Here's how to make a paper airplane.
# [ teen] The character says 'damn, that was close' before esc...
# [ adult] This tutorial covers explicit adult content.
|
This catches low-hanging fruit but misses a lot. “The protagonist contemplates ending it all” has no flagged keywords but clearly isn’t suitable for young children. That’s where the ML layer comes in.
ML-Based Content Classification#
Use a Hugging Face text classification model to score content on safety dimensions. The michellejieli/NSFW_text_classifier model works well for a binary safe/unsafe split. For more granular age-based ratings, you can combine it with a sentiment or topic classifier.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
| from transformers import pipeline
# Binary NSFW classifier -- fast and runs locally
nsfw_classifier = pipeline(
"text-classification",
model="michellejieli/NSFW_text_classifier",
)
# Content severity classifier for finer-grained ratings
# Using a toxicity model as a proxy for content intensity
toxicity_classifier = pipeline(
"text-classification",
model="martin-ha/toxic-comment-model",
)
def ml_content_rate(text: str) -> GatingResult:
"""Use ML classifiers to rate content age-appropriateness."""
nsfw_result = nsfw_classifier(text)[0]
toxicity_result = toxicity_classifier(text)[0]
nsfw_score = nsfw_result["score"] if nsfw_result["label"] == "NSFW" else 1 - nsfw_result["score"]
toxic_score = toxicity_result["score"] if toxicity_result["label"] == "toxic" else 1 - toxicity_result["score"]
# Map combined scores to age ratings
if nsfw_score > 0.8:
rating = AgeRating.ADULT
elif nsfw_score > 0.5 or toxic_score > 0.7:
rating = AgeRating.MATURE
elif toxic_score > 0.4 or nsfw_score > 0.3:
rating = AgeRating.TEEN
else:
rating = AgeRating.ALL_AGES
return GatingResult(
rating=rating,
confidence=max(nsfw_score, toxic_score),
reason=f"nsfw={nsfw_score:.2f}, toxic={toxic_score:.2f}",
)
# Test with ambiguous content
ambiguous_texts = [
"The knight drew his sword and struck down the dragon.",
"She undressed slowly, the moonlight catching her skin.",
"Mix two parts hydrogen peroxide with baking soda for cleaning.",
]
for text in ambiguous_texts:
result = ml_content_rate(text)
print(f"[{result.label:>10}] (conf={result.confidence:.2f}) {text[:55]}")
|
The ML layer catches things keywords miss. But it’s still a statistical model – it doesn’t understand context. A medical discussion about suicide prevention should be rated differently than a story glorifying self-harm. That’s where an LLM-based check helps on edge cases.
LLM-Based Secondary Classifier#
For content that scores in the uncertain range (not clearly safe, not clearly unsafe), route it through an LLM with a structured prompt. This is slower and costs money, so only use it when the fast layers can’t decide.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
| from openai import OpenAI
import json
client = OpenAI()
RATING_PROMPT = """You are a content rating system. Rate the following text for age-appropriateness.
Return ONLY valid JSON with these fields:
- "rating": one of "all_ages", "teen", "mature", "adult"
- "reason": brief explanation (under 50 words)
- "confidence": float between 0 and 1
Consider:
- Violence level and graphic detail
- Sexual content or innuendo
- Drug/alcohol references
- Profanity intensity
- Emotional distress themes
- Whether the content is educational vs glorifying
Text to rate:
{text}"""
def llm_content_rate(text: str) -> GatingResult:
"""Use an LLM for nuanced content rating on edge cases."""
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a content rating classifier. Respond only with JSON."},
{"role": "user", "content": RATING_PROMPT.format(text=text)},
],
temperature=0.0,
max_completion_tokens=200,
)
raw = response.choices[0].message.content.strip()
# Strip markdown code fences if present
if raw.startswith("```"):
raw = raw.split("\n", 1)[1].rsplit("```", 1)[0].strip()
parsed = json.loads(raw)
rating_map = {
"all_ages": AgeRating.ALL_AGES,
"teen": AgeRating.TEEN,
"mature": AgeRating.MATURE,
"adult": AgeRating.ADULT,
}
return GatingResult(
rating=rating_map.get(parsed["rating"], AgeRating.MATURE),
confidence=float(parsed.get("confidence", 0.8)),
reason=parsed.get("reason", "LLM classification"),
)
|
Now wire all three layers into a single function that escalates only when needed:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| def rate_content(text: str) -> GatingResult:
"""Three-layer content rating: keywords -> ML -> LLM."""
# Layer 1: Keyword check (sub-millisecond)
keyword_result = keyword_gate(text)
if keyword_result.rating in (AgeRating.ADULT, AgeRating.BLOCKED):
return keyword_result
# Layer 2: ML classifier (~50ms on GPU, ~200ms on CPU)
ml_result = ml_content_rate(text)
if ml_result.confidence > 0.85:
return ml_result
# Layer 3: LLM check for uncertain cases (~500ms-2s)
if ml_result.confidence < 0.6 or ml_result.rating == AgeRating.MATURE:
llm_result = llm_content_rate(text)
return llm_result
return ml_result
|
Most requests never hit the LLM. The keyword filter handles ~40% of flagged content, the ML classifier resolves another ~50%, and only the genuinely ambiguous 10% need the expensive LLM call.
FastAPI Middleware for Automatic Gating#
Here’s where it all comes together. This middleware intercepts every response from your AI endpoint and applies the gating pipeline before the response reaches the user.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
| from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
from starlette.middleware.base import BaseHTTPMiddleware
import json as json_module
# Load ML models at startup using lifespan
@asynccontextmanager
async def lifespan(app: FastAPI):
# Startup: load models once
app.state.nsfw_classifier = pipeline(
"text-classification",
model="michellejieli/NSFW_text_classifier",
)
app.state.toxicity_classifier = pipeline(
"text-classification",
model="martin-ha/toxic-comment-model",
)
print("Content gating models loaded")
yield
# Shutdown: cleanup if needed
print("Shutting down content gating")
app = FastAPI(lifespan=lifespan)
class ContentGatingMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
response = await call_next(request)
# Only gate AI-generated responses
if not request.url.path.startswith("/api/chat"):
return response
# Read the response body
body = b""
async for chunk in response.body_iterator:
body += chunk
try:
data = json_module.loads(body)
ai_text = data.get("response", "")
except (json_module.JSONDecodeError, AttributeError):
return response
# Get user's age tier from request headers or auth token
user_age_tier = request.headers.get("X-User-Age-Tier", "all_ages")
# Rate the content
result = rate_content(ai_text)
# Check if user's tier allows this content
tier_order = ["all_ages", "teen", "mature", "adult"]
user_tier_index = tier_order.index(user_age_tier)
content_tier_index = tier_order.index(result.label)
if content_tier_index > user_tier_index:
return JSONResponse(
status_code=451,
content={
"error": "Content blocked by age gate",
"required_tier": result.label,
"user_tier": user_age_tier,
"reason": result.reason,
},
)
# Attach rating metadata to allowed responses
data["content_rating"] = {
"rating": result.label,
"confidence": result.confidence,
}
return JSONResponse(content=data, status_code=response.status_code)
app.add_middleware(ContentGatingMiddleware)
@app.post("/api/chat")
async def chat(request: Request):
body = await request.json()
user_message = body.get("message", "")
# Your AI generation logic here
ai_response = f"Generated response for: {user_message}"
return {"response": ai_response}
|
The middleware reads the X-User-Age-Tier header to determine what the user is allowed to see. In production, you’d pull this from your auth system – a JWT claim, a database lookup, or a session store. Never trust a client-supplied age claim without verification.
COPPA and Age-Gate Considerations#
If your application might be used by children under 13, COPPA (Children’s Online Privacy Protection Act) applies. Here’s what that means in practice:
Age verification gate – you need an actual age gate before collecting any data. A date-of-birth input is the minimum standard. Don’t use a simple “I am over 13” checkbox – the FTC has fined companies for this.
Parental consent – for users under 13, you must get verifiable parental consent before collecting personal information. This includes chat logs, prompts, and any data your AI model processes.
Data minimization – collect only what’s needed. If a child is using your app, don’t log their conversations unless strictly necessary.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
| from datetime import date
@dataclass
class AgeGateResult:
allowed: bool
tier: str
requires_parental_consent: bool
message: str
def check_age_gate(birth_date: date) -> AgeGateResult:
"""Determine access tier from verified date of birth."""
today = date.today()
age = today.year - birth_date.year - (
(today.month, today.day) < (birth_date.month, birth_date.day)
)
if age < 13:
return AgeGateResult(
allowed=False,
tier="blocked",
requires_parental_consent=True,
message="Users under 13 require parental consent (COPPA).",
)
elif age < 17:
return AgeGateResult(
allowed=True,
tier="teen",
requires_parental_consent=False,
message="Access granted with teen content restrictions.",
)
elif age < 18:
return AgeGateResult(
allowed=True,
tier="mature",
requires_parental_consent=False,
message="Access granted with mature content restrictions.",
)
else:
return AgeGateResult(
allowed=True,
tier="adult",
requires_parental_consent=False,
message="Full access granted.",
)
# Example usage
test_dates = [
date(2016, 6, 15), # ~9 years old
date(2012, 3, 20), # ~13 years old
date(2009, 1, 10), # ~17 years old
date(2000, 11, 5), # ~25 years old
]
for dob in test_dates:
result = check_age_gate(dob)
print(f"DOB={dob} -> tier={result.tier}, consent={result.requires_parental_consent}")
|
A few more things to keep in mind:
- GDPR-K (if serving EU users) has similar requirements to COPPA but sets the age threshold at 16 in some member states.
- Age estimation vs. verification – self-reported birth dates are weak. For high-risk applications, consider document-based verification or third-party age estimation services.
- Default to the safest tier – if age is unknown, treat the user as a child. Better to gate content for an adult than to serve adult content to a minor.
Common Errors and Fixes#
RuntimeError: No model was supplied from the pipeline call
You forgot to specify the model parameter. The pipeline() function requires an explicit model name for most task types:
1
2
3
4
5
| # Wrong -- no model specified
classifier = pipeline("text-classification")
# Right
classifier = pipeline("text-classification", model="michellejieli/NSFW_text_classifier")
|
JSONDecodeError when parsing LLM responses
The LLM sometimes wraps its JSON in markdown code fences or adds explanatory text. Always strip those before parsing. The llm_content_rate function above handles this, but if you’re still seeing errors, add a fallback:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| import json
def safe_parse_rating(raw: str) -> dict:
"""Parse LLM rating response, handling common formatting issues."""
# Strip code fences
if "```" in raw:
raw = raw.split("```")[1]
if raw.startswith("json"):
raw = raw[4:]
raw = raw.strip()
try:
return json.loads(raw)
except json.JSONDecodeError:
return {"rating": "mature", "confidence": 0.5, "reason": "Parse failure, defaulting to safe"}
|
Middleware returning empty responses
When you consume response.body_iterator in middleware, the original response body is gone. You must reconstruct it. The middleware example above handles this by returning a new JSONResponse, but if you forget this step you’ll get empty 200 responses.
CUDA out-of-memory when loading multiple classifiers
Two transformer models can eat 2-4 GB of VRAM. If you’re on a constrained GPU, force CPU inference:
1
| export CUDA_VISIBLE_DEVICES=""
|
Or specify the device in code:
1
| classifier = pipeline("text-classification", model="martin-ha/toxic-comment-model", device="cpu")
|
Rate limiting the LLM fallback
If your ML classifier has low confidence on a lot of inputs, you’ll hammer the OpenAI API. Add a circuit breaker or rate limit to the LLM layer, and default to the ML result if the LLM budget is exhausted.