Standard sentiment analysis gives you one label per document. “This phone is great” gets a positive score. Fine. But what about “The battery lasts all day but the screen is too dim”? That’s positive and negative, depending on the aspect. Battery: positive. Screen: negative. One score per document throws away that nuance.
Aspect-based sentiment analysis (ABSA) fixes this. It extracts specific aspects from text and classifies the sentiment toward each one independently. Here’s the fastest way to get it running.
Quick Start with PyABSA#
PyABSA is the best turnkey option for ABSA. It bundles pretrained models, handles aspect extraction and sentiment classification in one pass, and works out of the box with a multilingual checkpoint.
1
| pip install pyabsa torch
|
Load the pretrained aspect extractor and run it on some reviews:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| from pyabsa import AspectTermExtraction as ATEPC
# Load the multilingual pretrained model
aspect_extractor = ATEPC.AspectExtractor(
"multilingual",
auto_device=True,
)
reviews = [
"The battery lasts all day but the screen is too dim.",
"Camera quality is stunning, though the price feels steep for what you get.",
"Fast shipping and great packaging, but the product broke after two days.",
]
results = aspect_extractor.predict(
reviews,
print_result=True,
save_result=False,
pred_sentiment=True,
)
|
The output gives you structured results per review:
1
2
3
4
5
6
| {
"text": "The battery lasts all day but the screen is too dim.",
"aspect": ["battery", "screen"],
"sentiment": ["Positive", "Negative"],
"confidence": [0.9983, 0.9971]
}
|
Each review gets its aspects extracted automatically, along with the sentiment polarity and a confidence score. No manual aspect lists needed.
If you want more control, or you need to integrate ABSA into an existing system, use the yangheng/deberta-v3-base-absa-v1.1 model directly through Hugging Face Transformers. This model takes a sentence and an aspect term as input, then classifies the sentiment toward that aspect.
The catch: this model handles sentiment classification per aspect, not aspect extraction. You need to supply the aspect terms yourself. That makes it a two-stage pipeline: first extract aspects, then classify sentiment for each one.
1
| pip install transformers torch
|
Here’s the full two-stage pipeline:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
| from transformers import pipeline
# Stage 1: Use a keyword extraction approach for aspect candidates
# Stage 2: Classify sentiment per aspect with DeBERTa ABSA model
classifier = pipeline(
"text-classification",
model="yangheng/deberta-v3-base-absa-v1.1",
top_k=None,
)
def extract_aspect_candidates(text: str, aspect_terms: list[str]) -> list[str]:
"""Find which known aspect terms appear in the text."""
found = []
text_lower = text.lower()
for term in aspect_terms:
if term.lower() in text_lower:
found.append(term)
return found
def analyze_aspects(text: str, aspect_terms: list[str]) -> list[dict]:
"""Extract aspects from text and classify sentiment for each."""
found_aspects = extract_aspect_candidates(text, aspect_terms)
results = []
for aspect in found_aspects:
scores = classifier(text, text_pair=aspect)
# scores is a list of dicts with 'label' and 'score'
best = max(scores[0], key=lambda x: x["score"])
results.append({
"aspect": aspect,
"sentiment": best["label"],
"confidence": round(best["score"], 4),
"all_scores": {s["label"]: round(s["score"], 4) for s in scores[0]},
})
return results
# Define aspects relevant to your domain
product_aspects = ["battery", "screen", "camera", "price", "shipping", "design", "performance"]
review = "The battery lasts all day but the screen is too dim and the price is unreasonable."
results = analyze_aspects(review, product_aspects)
for r in results:
print(f" {r['aspect']}: {r['sentiment']} ({r['confidence']})")
print(f" Scores: {r['all_scores']}")
|
Output:
1
2
3
4
5
6
| battery: Positive (0.9876)
Scores: {'Negative': 0.0034, 'Neutral': 0.009, 'Positive': 0.9876}
screen: Negative (0.9912)
Scores: {'Negative': 0.9912, 'Neutral': 0.0065, 'Positive': 0.0023}
price: Negative (0.9845)
Scores: {'Negative': 0.9845, 'Neutral': 0.0112, 'Positive': 0.0043}
|
The DeBERTa model is fast and accurate. It handles multi-language inputs too – English, Chinese, French, Spanish, Arabic, Dutch, Russian, and Turkish all work with the same checkpoint.
Processing Reviews at Scale#
When you’re analyzing thousands of reviews, you want batch processing and aggregate statistics. Here’s a complete pipeline that processes a list of reviews and produces per-aspect summary reports.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
| from collections import defaultdict
from pyabsa import AspectTermExtraction as ATEPC
def batch_analyze_reviews(reviews: list[str], checkpoint: str = "multilingual") -> dict:
"""Process a batch of reviews and aggregate aspect-level sentiment."""
extractor = ATEPC.AspectExtractor(checkpoint, auto_device=True)
# Run batch prediction
results = extractor.predict(
reviews,
print_result=False,
save_result=False,
pred_sentiment=True,
ignore_error=True,
)
# Aggregate per aspect
aspect_stats = defaultdict(lambda: {"positive": 0, "negative": 0, "neutral": 0, "total": 0})
for result in results.get("results", results) if isinstance(results, dict) else [results]:
# Handle both dict and list return formats
if hasattr(result, "__iter__"):
aspect_list = result.get("aspect", []) if isinstance(result, dict) else []
sentiment_list = result.get("sentiment", []) if isinstance(result, dict) else []
for aspect, sentiment in zip(aspect_list, sentiment_list):
aspect_lower = aspect.lower().strip()
sentiment_lower = sentiment.lower()
aspect_stats[aspect_lower][sentiment_lower] += 1
aspect_stats[aspect_lower]["total"] += 1
return dict(aspect_stats)
def print_aspect_report(stats: dict) -> None:
"""Print a formatted summary of aspect sentiments."""
print(f"{'Aspect':<20} {'Total':<8} {'Pos %':<8} {'Neg %':<8} {'Neu %':<8}")
print("-" * 52)
for aspect, counts in sorted(stats.items(), key=lambda x: x[1]["total"], reverse=True):
total = counts["total"]
pos_pct = (counts["positive"] / total) * 100
neg_pct = (counts["negative"] / total) * 100
neu_pct = (counts["neutral"] / total) * 100
print(f"{aspect:<20} {total:<8} {pos_pct:<8.1f} {neg_pct:<8.1f} {neu_pct:<8.1f}")
# Example usage
reviews = [
"The battery lasts all day and the camera takes sharp photos.",
"Terrible battery life, but the screen looks gorgeous.",
"Camera is decent for the price. Battery could be better.",
"Love the screen brightness. Performance is snappy.",
"The price is too high for this level of build quality.",
"Battery drains fast when using the camera app.",
"Screen resolution is perfect. Best display I have seen.",
"Great performance and the price was fair.",
]
stats = batch_analyze_reviews(reviews)
print_aspect_report(stats)
|
Expected output looks like:
1
2
3
4
5
6
7
| Aspect Total Pos % Neg % Neu %
----------------------------------------------------
battery 4 50.0 50.0 0.0
screen 3 100.0 0.0 0.0
camera 3 66.7 0.0 33.3
price 3 33.3 66.7 0.0
performance 2 100.0 0.0 0.0
|
This gives you a clear picture of what customers care about and how they feel about each specific feature. You can export this to JSON, feed it into a dashboard, or use it to prioritize product improvements.
Common Errors and Fixes#
RuntimeError: CUDA out of memory when processing large batches
The default batch size can overwhelm GPU memory on large review sets. Process in chunks:
1
2
3
4
5
6
| chunk_size = 32
all_results = []
for i in range(0, len(reviews), chunk_size):
chunk = reviews[i:i + chunk_size]
result = aspect_extractor.predict(chunk, print_result=False, pred_sentiment=True)
all_results.append(result)
|
Or force CPU inference by setting auto_device=False and using device="cpu" when loading the model.
KeyError: 'aspect' when accessing PyABSA results
PyABSA’s return format varies depending on whether you pass a single string or a list. Single strings return a dict directly. Lists return a list of dicts, or sometimes a wrapper object. Always normalize the output:
1
2
3
4
5
| result = aspect_extractor.predict(text, pred_sentiment=True, print_result=False)
if isinstance(result, dict):
aspects = result.get("aspect", [])
elif isinstance(result, list):
aspects = result[0].get("aspect", [])
|
text_pair argument ignored with wrong pipeline task
When using yangheng/deberta-v3-base-absa-v1.1, you must use the text-classification pipeline task. If you accidentally use sentiment-analysis (which is an alias but sometimes behaves differently with text_pair), the aspect term may get dropped. Always specify text-classification explicitly:
1
2
3
4
5
6
| # Correct
classifier = pipeline("text-classification", model="yangheng/deberta-v3-base-absa-v1.1", top_k=None)
result = classifier("Great battery life", text_pair="battery")
# Wrong -- may ignore the aspect term
classifier = pipeline("sentiment-analysis", model="yangheng/deberta-v3-base-absa-v1.1")
|