Readability scores tell you how hard your text is to read. They matter for content teams, accessibility compliance, and anywhere you need to match writing to an audience. The textstat library computes all the standard formulas in one call, and wrapping it in FastAPI gives you a scoring service you can hit from any app.
Here’s the fastest way to score a block of text:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| import textstat
text = """
The Australian platypus is seemingly a hybrid of a mammal and reptilian
creature. It has a duck-bill, webbed feet, and lays eggs. Despite these
reptilian features, it is classified as a mammal because it produces milk
and has fur.
"""
print(f"Flesch-Kincaid Grade: {textstat.flesch_kincaid_grade(text)}")
print(f"Gunning Fog Index: {textstat.gunning_fog(text)}")
print(f"SMOG Index: {textstat.smog_index(text)}")
print(f"Coleman-Liau Index: {textstat.coleman_liau_index(text)}")
print(f"Flesch Reading Ease: {textstat.flesch_reading_ease(text)}")
|
Install the dependencies before running:
1
2
| pip install textstat spacy fastapi uvicorn
python -m spacy download en_core_web_sm
|
What Each Index Measures#
These formulas all estimate grade level, but they weight different features:
- Flesch-Kincaid Grade Level uses average sentence length and syllables per word. Lower is easier. A score of 8.0 means an eighth grader can follow it.
- Gunning Fog Index counts “complex words” (three or more syllables) and sentence length. It tends to run higher than Flesch-Kincaid on technical text.
- SMOG Index (Simple Measure of Gobbledygook) focuses on polysyllabic words. It needs at least 30 sentences for a reliable score, though
textstat will still produce a number on shorter inputs. - Coleman-Liau Index works on characters per word instead of syllables, which makes it faster to compute on large corpora and less sensitive to abbreviations.
No single index is perfect. Combining them gives you a more stable picture.
Combining Metrics into a Single Grade#
A weighted average across multiple indices smooths out the quirks of any one formula. Here’s a function that returns a composite score and a human-readable label:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
| import textstat
def composite_readability(text: str) -> dict:
fk = textstat.flesch_kincaid_grade(text)
gf = textstat.gunning_fog(text)
smog = textstat.smog_index(text)
cl = textstat.coleman_liau_index(text)
# Weighted average — Flesch-Kincaid and Coleman-Liau get more weight
# because they're more stable on short texts
composite = (fk * 0.3) + (gf * 0.2) + (smog * 0.2) + (cl * 0.3)
composite = round(composite, 2)
if composite <= 6:
label = "Easy (elementary school)"
elif composite <= 10:
label = "Standard (middle/high school)"
elif composite <= 14:
label = "Difficult (college level)"
else:
label = "Very difficult (graduate/professional)"
return {
"flesch_kincaid_grade": fk,
"gunning_fog": gf,
"smog_index": smog,
"coleman_liau_index": cl,
"composite_score": composite,
"label": label,
}
result = composite_readability("The cat sat on the mat. It was a good mat.")
print(result)
|
The weights are a starting point. Adjust them based on which indices correlate best with your audience feedback.
Tokenization with spaCy for Deeper Analysis#
textstat handles tokenization internally, but sometimes you need sentence and word counts for your own metrics or to feed into downstream pipelines. spaCy gives you reliable sentence boundaries and token-level features:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| import spacy
nlp = spacy.load("en_core_web_sm")
text = """
Machine learning models require large amounts of labeled data.
Transfer learning reduces this requirement by reusing pretrained weights.
Fine-tuning adapts a general model to a specific domain.
"""
doc = nlp(text)
sentences = list(doc.sents)
words = [token for token in doc if not token.is_punct and not token.is_space]
print(f"Sentences: {len(sentences)}")
print(f"Words: {len(words)}")
print(f"Avg words per sentence: {len(words) / len(sentences):.1f}")
# Show sentence-level readability
for sent in sentences:
score = textstat.flesch_kincaid_grade(sent.text)
print(f" FK {score:5.1f} | {sent.text.strip()}")
|
This per-sentence breakdown is useful when you want to flag specific hard-to-read sentences in an editing workflow.
Batch Processing Documents#
When you have hundreds or thousands of documents, process them in bulk and output a report:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
| import csv
import io
import textstat
def batch_score(documents: list[dict]) -> list[dict]:
"""Score a list of documents. Each dict must have 'id' and 'text' keys."""
results = []
for doc in documents:
text = doc["text"]
results.append({
"id": doc["id"],
"word_count": textstat.lexicon_count(text, removepunct=True),
"sentence_count": textstat.sentence_count(text),
"flesch_kincaid_grade": textstat.flesch_kincaid_grade(text),
"gunning_fog": textstat.gunning_fog(text),
"smog_index": textstat.smog_index(text),
"coleman_liau_index": textstat.coleman_liau_index(text),
"flesch_reading_ease": textstat.flesch_reading_ease(text),
})
return results
def to_csv(results: list[dict]) -> str:
"""Convert results to a CSV string."""
output = io.StringIO()
writer = csv.DictWriter(output, fieldnames=results[0].keys())
writer.writeheader()
writer.writerows(results)
return output.getvalue()
# Example usage
documents = [
{"id": "doc_1", "text": "The cat sat on the mat. It was warm."},
{"id": "doc_2", "text": "Quantum entanglement demonstrates nonlocal correlations between particles separated by arbitrary distances."},
{"id": "doc_3", "text": "Add water. Stir. Let it sit for five minutes. Serve hot."},
]
results = batch_score(documents)
for r in results:
print(f"{r['id']}: FK={r['flesch_kincaid_grade']}, Fog={r['gunning_fog']}, Ease={r['flesch_reading_ease']}")
print("\n--- CSV Report ---")
print(to_csv(results))
|
Serving Scores with FastAPI#
Wrap everything in a FastAPI app so other services can call it over HTTP. This uses the modern lifespan pattern instead of the deprecated @app.on_event decorator:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
| from contextlib import asynccontextmanager
import spacy
import textstat
from fastapi import FastAPI
from pydantic import BaseModel
class ScoreRequest(BaseModel):
text: str
class ScoreResponse(BaseModel):
word_count: int
sentence_count: int
flesch_kincaid_grade: float
gunning_fog: float
smog_index: float
coleman_liau_index: float
flesch_reading_ease: float
composite_score: float
label: str
app_state = {}
@asynccontextmanager
async def lifespan(app: FastAPI):
# Load spaCy model once at startup
app_state["nlp"] = spacy.load("en_core_web_sm")
yield
app_state.clear()
app = FastAPI(title="Readability Scorer", lifespan=lifespan)
def compute_composite(fk: float, gf: float, smog: float, cl: float) -> tuple[float, str]:
score = round((fk * 0.3) + (gf * 0.2) + (smog * 0.2) + (cl * 0.3), 2)
if score <= 6:
label = "Easy"
elif score <= 10:
label = "Standard"
elif score <= 14:
label = "Difficult"
else:
label = "Very difficult"
return score, label
@app.post("/score", response_model=ScoreResponse)
async def score_text(req: ScoreRequest):
text = req.text
fk = textstat.flesch_kincaid_grade(text)
gf = textstat.gunning_fog(text)
smog = textstat.smog_index(text)
cl = textstat.coleman_liau_index(text)
composite, label = compute_composite(fk, gf, smog, cl)
return ScoreResponse(
word_count=textstat.lexicon_count(text, removepunct=True),
sentence_count=textstat.sentence_count(text),
flesch_kincaid_grade=fk,
gunning_fog=gf,
smog_index=smog,
coleman_liau_index=cl,
flesch_reading_ease=textstat.flesch_reading_ease(text),
composite_score=composite,
label=label,
)
|
Run it with:
1
| uvicorn app:app --host 0.0.0.0 --port 8000
|
Test with curl:
1
2
3
| curl -X POST http://localhost:8000/score \
-H "Content-Type: application/json" \
-d '{"text": "The quick brown fox jumps over the lazy dog. Short sentences are easy to read."}'
|
Common Errors and Fixes#
textstat.smog_index() returns 0.0 on short text. SMOG needs at least 30 sentences by design. On shorter text, textstat returns 0. Either skip SMOG for short inputs or pad with a fallback:
1
2
3
4
| smog = textstat.smog_index(text)
if smog == 0.0 and textstat.sentence_count(text) < 30:
# Fall back to Flesch-Kincaid as a proxy
smog = textstat.flesch_kincaid_grade(text)
|
spaCy model not found. If you get OSError: [E050] Can't find model 'en_core_web_sm', you forgot to download it:
1
| python -m spacy download en_core_web_sm
|
Empty string crashes. Both textstat and spaCy handle empty strings poorly. Guard against it:
1
2
| if not text or not text.strip():
raise ValueError("Text must not be empty")
|
FastAPI lifespan not recognized. You need FastAPI 0.93 or later. If you’re on an older version, upgrade:
1
| pip install --upgrade fastapi
|
Scores seem wrong for non-English text. textstat defaults to English syllable counting. Set the language explicitly if you’re scoring other languages:
1
2
| textstat.set_lang("de") # German
score = textstat.flesch_reading_ease(german_text)
|