If your AI application serves minors – or even might – you need content gating. Not a disclaimer page that asks “Are you 18?” and trusts the answer. Real gating that classifies every AI-generated response and blocks content that doesn’t match the user’s age tier.

Here’s a three-layer approach: regex keyword filtering (fast, catches the obvious stuff), a Hugging Face classifier (catches subtler content), and an LLM-based secondary check (handles nuance and context). All wired together with FastAPI middleware so every response gets screened automatically.

The fast version – a keyword filter you can drop in right now:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
import re
from dataclasses import dataclass, field
from enum import Enum


class AgeRating(Enum):
    ALL_AGES = 0       # Safe for everyone
    TEEN = 1           # 13+
    MATURE = 2         # 17+
    ADULT = 3          # 18+
    BLOCKED = 4        # Never serve this


AGE_RATING_LABELS = {
    AgeRating.ALL_AGES: "all_ages",
    AgeRating.TEEN: "teen",
    AgeRating.MATURE: "mature",
    AgeRating.ADULT: "adult",
    AgeRating.BLOCKED: "blocked",
}


@dataclass
class GatingResult:
    rating: AgeRating
    flagged_terms: list[str] = field(default_factory=list)
    confidence: float = 1.0
    reason: str = ""

    @property
    def label(self) -> str:
        return AGE_RATING_LABELS[self.rating]


# Patterns mapped to minimum age ratings
KEYWORD_RULES: list[tuple[str, AgeRating]] = [
    (r"\b(kill|murder|suicide|self[- ]?harm)\b", AgeRating.MATURE),
    (r"\b(porn|xxx|nsfw|hentai)\b", AgeRating.ADULT),
    (r"\b(gore|torture|mutilation)\b", AgeRating.ADULT),
    (r"\b(drug use|cocaine|heroin|meth)\b", AgeRating.MATURE),
    (r"\b(damn|hell|crap)\b", AgeRating.TEEN),
    (r"\b(fuck|shit|ass)\b", AgeRating.MATURE),
]


def keyword_gate(text: str) -> GatingResult:
    """Fast first-pass content rating using regex patterns."""
    text_lower = text.lower()
    highest_rating = AgeRating.ALL_AGES
    flagged = []

    for pattern, rating in KEYWORD_RULES:
        matches = re.findall(pattern, text_lower)
        if matches:
            flagged.extend(matches)
            if rating.value > highest_rating.value:
                highest_rating = rating

    return GatingResult(
        rating=highest_rating,
        flagged_terms=flagged,
        confidence=0.7 if flagged else 0.5,
        reason="keyword match" if flagged else "no keywords detected",
    )


# Quick test
samples = [
    "Here's how to make a paper airplane.",
    "The character says 'damn, that was close' before escaping.",
    "This tutorial covers explicit adult content.",
]

for sample in samples:
    result = keyword_gate(sample)
    print(f"[{result.label:>10}] {sample[:60]}")
# [  all_ages] Here's how to make a paper airplane.
# [      teen] The character says 'damn, that was close' before esc...
# [     adult] This tutorial covers explicit adult content.

This catches low-hanging fruit but misses a lot. “The protagonist contemplates ending it all” has no flagged keywords but clearly isn’t suitable for young children. That’s where the ML layer comes in.

ML-Based Content Classification

Use a Hugging Face text classification model to score content on safety dimensions. The michellejieli/NSFW_text_classifier model works well for a binary safe/unsafe split. For more granular age-based ratings, you can combine it with a sentiment or topic classifier.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
from transformers import pipeline

# Binary NSFW classifier -- fast and runs locally
nsfw_classifier = pipeline(
    "text-classification",
    model="michellejieli/NSFW_text_classifier",
)

# Content severity classifier for finer-grained ratings
# Using a toxicity model as a proxy for content intensity
toxicity_classifier = pipeline(
    "text-classification",
    model="martin-ha/toxic-comment-model",
)


def ml_content_rate(text: str) -> GatingResult:
    """Use ML classifiers to rate content age-appropriateness."""
    nsfw_result = nsfw_classifier(text)[0]
    toxicity_result = toxicity_classifier(text)[0]

    nsfw_score = nsfw_result["score"] if nsfw_result["label"] == "NSFW" else 1 - nsfw_result["score"]
    toxic_score = toxicity_result["score"] if toxicity_result["label"] == "toxic" else 1 - toxicity_result["score"]

    # Map combined scores to age ratings
    if nsfw_score > 0.8:
        rating = AgeRating.ADULT
    elif nsfw_score > 0.5 or toxic_score > 0.7:
        rating = AgeRating.MATURE
    elif toxic_score > 0.4 or nsfw_score > 0.3:
        rating = AgeRating.TEEN
    else:
        rating = AgeRating.ALL_AGES

    return GatingResult(
        rating=rating,
        confidence=max(nsfw_score, toxic_score),
        reason=f"nsfw={nsfw_score:.2f}, toxic={toxic_score:.2f}",
    )


# Test with ambiguous content
ambiguous_texts = [
    "The knight drew his sword and struck down the dragon.",
    "She undressed slowly, the moonlight catching her skin.",
    "Mix two parts hydrogen peroxide with baking soda for cleaning.",
]

for text in ambiguous_texts:
    result = ml_content_rate(text)
    print(f"[{result.label:>10}] (conf={result.confidence:.2f}) {text[:55]}")

The ML layer catches things keywords miss. But it’s still a statistical model – it doesn’t understand context. A medical discussion about suicide prevention should be rated differently than a story glorifying self-harm. That’s where an LLM-based check helps on edge cases.

LLM-Based Secondary Classifier

For content that scores in the uncertain range (not clearly safe, not clearly unsafe), route it through an LLM with a structured prompt. This is slower and costs money, so only use it when the fast layers can’t decide.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from openai import OpenAI
import json

client = OpenAI()

RATING_PROMPT = """You are a content rating system. Rate the following text for age-appropriateness.

Return ONLY valid JSON with these fields:
- "rating": one of "all_ages", "teen", "mature", "adult"
- "reason": brief explanation (under 50 words)
- "confidence": float between 0 and 1

Consider:
- Violence level and graphic detail
- Sexual content or innuendo
- Drug/alcohol references
- Profanity intensity
- Emotional distress themes
- Whether the content is educational vs glorifying

Text to rate:
{text}"""


def llm_content_rate(text: str) -> GatingResult:
    """Use an LLM for nuanced content rating on edge cases."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a content rating classifier. Respond only with JSON."},
            {"role": "user", "content": RATING_PROMPT.format(text=text)},
        ],
        temperature=0.0,
        max_completion_tokens=200,
    )

    raw = response.choices[0].message.content.strip()
    # Strip markdown code fences if present
    if raw.startswith("```"):
        raw = raw.split("\n", 1)[1].rsplit("```", 1)[0].strip()

    parsed = json.loads(raw)

    rating_map = {
        "all_ages": AgeRating.ALL_AGES,
        "teen": AgeRating.TEEN,
        "mature": AgeRating.MATURE,
        "adult": AgeRating.ADULT,
    }

    return GatingResult(
        rating=rating_map.get(parsed["rating"], AgeRating.MATURE),
        confidence=float(parsed.get("confidence", 0.8)),
        reason=parsed.get("reason", "LLM classification"),
    )

Now wire all three layers into a single function that escalates only when needed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
def rate_content(text: str) -> GatingResult:
    """Three-layer content rating: keywords -> ML -> LLM."""
    # Layer 1: Keyword check (sub-millisecond)
    keyword_result = keyword_gate(text)
    if keyword_result.rating in (AgeRating.ADULT, AgeRating.BLOCKED):
        return keyword_result

    # Layer 2: ML classifier (~50ms on GPU, ~200ms on CPU)
    ml_result = ml_content_rate(text)
    if ml_result.confidence > 0.85:
        return ml_result

    # Layer 3: LLM check for uncertain cases (~500ms-2s)
    if ml_result.confidence < 0.6 or ml_result.rating == AgeRating.MATURE:
        llm_result = llm_content_rate(text)
        return llm_result

    return ml_result

Most requests never hit the LLM. The keyword filter handles ~40% of flagged content, the ML classifier resolves another ~50%, and only the genuinely ambiguous 10% need the expensive LLM call.

FastAPI Middleware for Automatic Gating

Here’s where it all comes together. This middleware intercepts every response from your AI endpoint and applies the gating pipeline before the response reaches the user.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
from contextlib import asynccontextmanager
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
from starlette.middleware.base import BaseHTTPMiddleware
import json as json_module


# Load ML models at startup using lifespan
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: load models once
    app.state.nsfw_classifier = pipeline(
        "text-classification",
        model="michellejieli/NSFW_text_classifier",
    )
    app.state.toxicity_classifier = pipeline(
        "text-classification",
        model="martin-ha/toxic-comment-model",
    )
    print("Content gating models loaded")
    yield
    # Shutdown: cleanup if needed
    print("Shutting down content gating")


app = FastAPI(lifespan=lifespan)


class ContentGatingMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        response = await call_next(request)

        # Only gate AI-generated responses
        if not request.url.path.startswith("/api/chat"):
            return response

        # Read the response body
        body = b""
        async for chunk in response.body_iterator:
            body += chunk

        try:
            data = json_module.loads(body)
            ai_text = data.get("response", "")
        except (json_module.JSONDecodeError, AttributeError):
            return response

        # Get user's age tier from request headers or auth token
        user_age_tier = request.headers.get("X-User-Age-Tier", "all_ages")

        # Rate the content
        result = rate_content(ai_text)

        # Check if user's tier allows this content
        tier_order = ["all_ages", "teen", "mature", "adult"]
        user_tier_index = tier_order.index(user_age_tier)
        content_tier_index = tier_order.index(result.label)

        if content_tier_index > user_tier_index:
            return JSONResponse(
                status_code=451,
                content={
                    "error": "Content blocked by age gate",
                    "required_tier": result.label,
                    "user_tier": user_age_tier,
                    "reason": result.reason,
                },
            )

        # Attach rating metadata to allowed responses
        data["content_rating"] = {
            "rating": result.label,
            "confidence": result.confidence,
        }

        return JSONResponse(content=data, status_code=response.status_code)


app.add_middleware(ContentGatingMiddleware)


@app.post("/api/chat")
async def chat(request: Request):
    body = await request.json()
    user_message = body.get("message", "")

    # Your AI generation logic here
    ai_response = f"Generated response for: {user_message}"

    return {"response": ai_response}

The middleware reads the X-User-Age-Tier header to determine what the user is allowed to see. In production, you’d pull this from your auth system – a JWT claim, a database lookup, or a session store. Never trust a client-supplied age claim without verification.

COPPA and Age-Gate Considerations

If your application might be used by children under 13, COPPA (Children’s Online Privacy Protection Act) applies. Here’s what that means in practice:

Age verification gate – you need an actual age gate before collecting any data. A date-of-birth input is the minimum standard. Don’t use a simple “I am over 13” checkbox – the FTC has fined companies for this.

Parental consent – for users under 13, you must get verifiable parental consent before collecting personal information. This includes chat logs, prompts, and any data your AI model processes.

Data minimization – collect only what’s needed. If a child is using your app, don’t log their conversations unless strictly necessary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
from datetime import date


@dataclass
class AgeGateResult:
    allowed: bool
    tier: str
    requires_parental_consent: bool
    message: str


def check_age_gate(birth_date: date) -> AgeGateResult:
    """Determine access tier from verified date of birth."""
    today = date.today()
    age = today.year - birth_date.year - (
        (today.month, today.day) < (birth_date.month, birth_date.day)
    )

    if age < 13:
        return AgeGateResult(
            allowed=False,
            tier="blocked",
            requires_parental_consent=True,
            message="Users under 13 require parental consent (COPPA).",
        )
    elif age < 17:
        return AgeGateResult(
            allowed=True,
            tier="teen",
            requires_parental_consent=False,
            message="Access granted with teen content restrictions.",
        )
    elif age < 18:
        return AgeGateResult(
            allowed=True,
            tier="mature",
            requires_parental_consent=False,
            message="Access granted with mature content restrictions.",
        )
    else:
        return AgeGateResult(
            allowed=True,
            tier="adult",
            requires_parental_consent=False,
            message="Full access granted.",
        )


# Example usage
test_dates = [
    date(2016, 6, 15),   # ~9 years old
    date(2012, 3, 20),   # ~13 years old
    date(2009, 1, 10),   # ~17 years old
    date(2000, 11, 5),   # ~25 years old
]

for dob in test_dates:
    result = check_age_gate(dob)
    print(f"DOB={dob} -> tier={result.tier}, consent={result.requires_parental_consent}")

A few more things to keep in mind:

  • GDPR-K (if serving EU users) has similar requirements to COPPA but sets the age threshold at 16 in some member states.
  • Age estimation vs. verification – self-reported birth dates are weak. For high-risk applications, consider document-based verification or third-party age estimation services.
  • Default to the safest tier – if age is unknown, treat the user as a child. Better to gate content for an adult than to serve adult content to a minor.

Common Errors and Fixes

RuntimeError: No model was supplied from the pipeline call

You forgot to specify the model parameter. The pipeline() function requires an explicit model name for most task types:

1
2
3
4
5
# Wrong -- no model specified
classifier = pipeline("text-classification")

# Right
classifier = pipeline("text-classification", model="michellejieli/NSFW_text_classifier")

JSONDecodeError when parsing LLM responses

The LLM sometimes wraps its JSON in markdown code fences or adds explanatory text. Always strip those before parsing. The llm_content_rate function above handles this, but if you’re still seeing errors, add a fallback:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import json

def safe_parse_rating(raw: str) -> dict:
    """Parse LLM rating response, handling common formatting issues."""
    # Strip code fences
    if "```" in raw:
        raw = raw.split("```")[1]
        if raw.startswith("json"):
            raw = raw[4:]
    raw = raw.strip()

    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        return {"rating": "mature", "confidence": 0.5, "reason": "Parse failure, defaulting to safe"}

Middleware returning empty responses

When you consume response.body_iterator in middleware, the original response body is gone. You must reconstruct it. The middleware example above handles this by returning a new JSONResponse, but if you forget this step you’ll get empty 200 responses.

CUDA out-of-memory when loading multiple classifiers

Two transformer models can eat 2-4 GB of VRAM. If you’re on a constrained GPU, force CPU inference:

1
export CUDA_VISIBLE_DEVICES=""

Or specify the device in code:

1
classifier = pipeline("text-classification", model="martin-ha/toxic-comment-model", device="cpu")

Rate limiting the LLM fallback

If your ML classifier has low confidence on a lot of inputs, you’ll hammer the OpenAI API. Add a circuit breaker or rate limit to the LLM layer, and default to the ML result if the LLM budget is exhausted.