You want a REST endpoint that takes text and returns a sentiment label with a confidence score. No training, no labeling, no GPU cluster. Just a pre-trained model behind a FastAPI server that handles real traffic.
The cardiffnlp/twitter-roberta-base-sentiment-latest model is the best off-the-shelf option for general English sentiment. It is a RoBERTa-base model fine-tuned on ~124 million tweets, outputs three labels (negative, neutral, positive), and runs fast on CPU. We will wrap it in a FastAPI app with proper startup loading, input validation, batch support, and the preprocessing the model actually needs.
The Minimal Working API
Install dependencies first:
| |
Here is the full API in a single file. Save this as main.py:
| |
Start the server:
| |
Test it:
| |
Response:
| |
That is a working sentiment API. The model downloads on first startup (about 500MB), then stays in memory. Subsequent restarts use the cached model from ~/.cache/huggingface/.
Why Preprocessing Matters
The cardiffnlp model was trained on tweets where usernames are replaced with @user and URLs with http. If you skip this step, the model still produces output – but the scores drift. A tweet like @elonmusk check https://example.com great product! gives different confidence scores than @user check http great product!. The difference is typically 5-15% on the confidence score, which is enough to flip borderline predictions.
The preprocess function above handles this. Always run it before inference.
Adding Batch Predictions
Single-text predictions are fine for interactive use, but if you need to score hundreds of reviews, sending them one by one wastes time. The Transformers pipeline natively supports batching – you just pass a list.
| |
The batch_size=16 parameter controls how many texts the model processes in a single forward pass. On CPU, 16 is a reasonable default. On GPU, you can push this to 32 or 64 depending on VRAM. The pipeline handles padding and truncation internally.
Test it:
| |
Handling Long Texts
RoBERTa has a 512-token limit. If you pass a 2000-word product review, the tokenizer silently truncates it, which means the model only sees the first ~400 words. For most sentiment tasks this is fine – sentiment is usually expressed early. But if you want to be explicit about it, set truncation=True when creating the pipeline:
| |
This suppresses the warning you would otherwise see:
| |
Common Errors and Fixes
OSError: Can’t load tokenizer for ‘cardiffnlp/twitter-roberta-base-sentiment-latest’
This usually means you are behind a corporate proxy or firewall blocking huggingface.co. Fix it by downloading the model locally first:
| |
Or set the proxy:
| |
torch.cuda.OutOfMemoryError: CUDA out of memory
You loaded the model on GPU but the batch is too large. Reduce batch_size or switch to CPU with device=-1. For a base-sized model (125M parameters), CPU inference is fast enough for most APIs – about 50-100ms per prediction.
Predictions always return Neutral
You are probably passing empty strings or whitespace-only input after preprocessing. The Pydantic min_length=1 validator catches empty strings, but a text like @john http preprocesses to @user http which is effectively content-free. Add a length check after preprocessing:
| |
Running with Multiple Workers
For production traffic, run multiple Uvicorn workers behind Gunicorn. Each worker loads its own copy of the model, so watch your memory – the model uses about 500MB per worker.
| |
Four workers means ~2GB of RAM for the models alone. If memory is tight, use Gunicorn’s --preload flag to load the model once and share it across workers via copy-on-write:
| |
This works because the model weights are read-only after loading. The OS shares the physical memory pages between processes until a worker tries to write to them.
Switching Models
The entire API is model-agnostic. Swap the model string to use a different sentiment model without changing any other code:
| Model | Labels | Best For |
|---|---|---|
cardiffnlp/twitter-roberta-base-sentiment-latest | Negative/Neutral/Positive | Social media, general text |
nlptown/bert-base-multilingual-uncased-sentiment | 1-5 stars | Product reviews, multilingual |
distilbert-base-uncased-finetuned-sst-2-english | Positive/Negative | Binary sentiment, fast inference |
The pipeline API normalizes the output format, so your FastAPI endpoints work with any of these without modification. Just change the model name in the lifespan function and redeploy.
Related Guides
- How to Build a Named Entity Recognition Pipeline with spaCy and Transformers
- How to Build a Text Similarity API with Cross-Encoders
- How to Build a Text Style Transfer Pipeline with Transformers
- How to Build a Language Detection and Translation Pipeline
- How to Build a Spell Checking and Autocorrect Pipeline with Python
- How to Build a Text Classification Pipeline with SetFit
- How to Build a Text Paraphrase Pipeline with T5 and PEGASUS
- How to Build a Text Readability Scoring Pipeline with Python
- How to Build an Aspect-Based Sentiment Analysis Pipeline
- How to Build a Legal NER Pipeline with Transformers and spaCy