Sentiment analysis gives you positive, negative, or neutral. That is rarely enough. When users write “I can’t believe they actually shipped this,” you need to know whether that is admiration, anger, or surprise. GoEmotions, Google’s dataset of 58k Reddit comments labeled across 27 emotions plus neutral, solves this. And there are solid fine-tuned models on Hugging Face ready to use right now.
Here is the fastest way to get emotion predictions running:
| |
That loads a RoBERTa model fine-tuned on GoEmotions, runs inference, and filters to emotions above a 0.3 confidence threshold. You get multi-label output because real text carries multiple emotions at once.
The GoEmotions Label Set
GoEmotions defines 27 emotion categories plus a neutral class. These are not arbitrary – they were selected through taxonomic research to cover the range of emotions expressed in online text. The full set:
admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise, neutral
This granularity matters. A customer support system that can distinguish “disappointment” from “anger” can route tickets differently. A social media monitor that separates “confusion” from “disapproval” gives product teams actionable signal instead of a vague negative score.
The model outputs a score for every label on every input. That is what makes this multi-label classification rather than multi-class – a single sentence can be both “gratitude” and “joy” simultaneously.
Multi-Label Inference with Thresholding
The raw model output gives you scores for all 28 labels. You need a threshold strategy to decide which emotions are “active.” A fixed threshold of 0.3 works well as a starting point, but you can tune per-label thresholds if you have evaluation data.
| |
Notice we use torch.sigmoid instead of softmax. This is a multi-label problem – each label gets an independent probability via sigmoid, not a probability distribution that sums to one. Using softmax here would suppress co-occurring emotions and give you worse results.
Tuning the Threshold
Lower thresholds (0.1-0.2) catch more emotions but increase false positives. Higher thresholds (0.5+) give you only strong signals but miss subtler emotions. For production systems, run a sweep against labeled data:
| |
At 0.1 you might get six or seven labels. At 0.5 you might get one or two. Pick the threshold that matches your use case’s tolerance for noise.
Batch Processing with the Pipeline API
For processing many texts at once, the Transformers pipeline handles batching efficiently. It automatically pads and batches inputs for GPU throughput.
| |
Setting top_k=None returns all 28 labels sorted by score. If you only want the top 5, set top_k=5 to save some processing time. The batch_size parameter controls how many inputs get processed together – larger batches use more memory but run faster on GPU.
For very large datasets, use a generator to avoid loading everything into memory:
| |
Serving the Model via FastAPI
Wrapping this in a FastAPI service takes about 40 lines. Use the lifespan context manager to load the model once at startup and share it across requests.
| |
Save that as app.py and run it:
| |
Test it with curl:
| |
The batch endpoint handles multiple texts in a single request, which is significantly faster than making individual calls when you have many inputs to process.
Common Errors and Fixes
RuntimeError: CUDA out of memory – The model is about 500MB. If your GPU runs out of memory during batching, reduce batch_size or switch to CPU by removing the device=0 argument. For CPU inference, each request takes roughly 50-100ms which is fine for most API use cases.
The model was not found on Hugging Face Hub – Double check the model ID is exactly SamLowe/roberta-base-go_emotions. Model IDs are case-sensitive. If you are behind a corporate proxy, set HF_HUB_OFFLINE=0 and configure HTTPS_PROXY.
All scores are very low (below 0.1) – This usually means the input text is too short or too different from Reddit-style comments the model was trained on. GoEmotions was built from Reddit data, so very formal or domain-specific text may not classify well. Consider fine-tuning on your own data if this is a persistent issue.
torch.sigmoid vs torch.softmax confusion – If you are getting exactly one emotion per input and the scores sum to 1.0, you accidentally used softmax. This model is multi-label – use sigmoid so each label gets an independent probability.
Slow first request – The model downloads on first use (about 500MB). Subsequent loads use the Hugging Face cache at ~/.cache/huggingface/. In Docker deployments, mount this directory as a volume or bake the model into the image to avoid download delays.
FastAPI TypeError: 'async_generator' object is not callable – Make sure you are using @asynccontextmanager from contextlib and passing the lifespan function (not calling it) to the FastAPI() constructor. The pattern is FastAPI(lifespan=lifespan), not FastAPI(lifespan=lifespan()).
Related Guides
- How to Build an Aspect-Based Sentiment Analysis Pipeline
- How to Build a Text Entailment and Contradiction Detection Pipeline
- How to Build an Extractive Question Answering System with Transformers
- How to Build a Sentiment-Aware Search Pipeline with Embeddings
- How to Build a Multilingual Sentiment Pipeline with XLM-RoBERTa
- How to Build an Abstractive Summarization Pipeline with PEGASUS
- How to Build a RAG Pipeline with Hugging Face Transformers v5
- How to Build a Text-to-Knowledge-Graph Pipeline with SpaCy and NetworkX
- How to Build a Text Summarization Pipeline with Sumy and Transformers
- How to Build a Keyphrase Generation Pipeline with KeyphraseVectorizers