How to Build an Aspect-Based Sentiment Analysis Pipeline

Standard sentiment analysis gives you one label per document. “This phone is great” gets a positive score. Fine. But what about “The battery lasts all day but the screen is too dim”? That’s positive and negative, depending on the aspect. Battery: positive. Screen: negative. One score per document throws away that nuance.

Aspect-based sentiment analysis (ABSA) fixes this. It extracts specific aspects from text and classifies the sentiment toward each one independently. Here’s the fastest way to get it running.

Quick Start with PyABSA

PyABSA is the best turnkey option for ABSA. It bundles pretrained models, handles aspect extraction and sentiment classification in one pass, and works out of the box with a multilingual checkpoint.

1
pip install pyabsa torch

Load the pretrained aspect extractor and run it on some reviews:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from pyabsa import AspectTermExtraction as ATEPC

# Load the multilingual pretrained model
aspect_extractor = ATEPC.AspectExtractor(
    "multilingual",
    auto_device=True,
)

reviews = [
    "The battery lasts all day but the screen is too dim.",
    "Camera quality is stunning, though the price feels steep for what you get.",
    "Fast shipping and great packaging, but the product broke after two days.",
]

results = aspect_extractor.predict(
    reviews,
    print_result=True,
    save_result=False,
    pred_sentiment=True,
)

The output gives you structured results per review:

1
2
3
4
5
6
{
  "text": "The battery lasts all day but the screen is too dim.",
  "aspect": ["battery", "screen"],
  "sentiment": ["Positive", "Negative"],
  "confidence": [0.9983, 0.9971]
}

Each review gets its aspects extracted automatically, along with the sentiment polarity and a confidence score. No manual aspect lists needed.

Building a Custom Pipeline with Transformers

If you want more control, or you need to integrate ABSA into an existing system, use the yangheng/deberta-v3-base-absa-v1.1 model directly through Hugging Face Transformers. This model takes a sentence and an aspect term as input, then classifies the sentiment toward that aspect.

The catch: this model handles sentiment classification per aspect, not aspect extraction. You need to supply the aspect terms yourself. That makes it a two-stage pipeline: first extract aspects, then classify sentiment for each one.

1
pip install transformers torch

Here’s the full two-stage pipeline:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
from transformers import pipeline

# Stage 1: Use a keyword extraction approach for aspect candidates
# Stage 2: Classify sentiment per aspect with DeBERTa ABSA model
classifier = pipeline(
    "text-classification",
    model="yangheng/deberta-v3-base-absa-v1.1",
    top_k=None,
)


def extract_aspect_candidates(text: str, aspect_terms: list[str]) -> list[str]:
    """Find which known aspect terms appear in the text."""
    found = []
    text_lower = text.lower()
    for term in aspect_terms:
        if term.lower() in text_lower:
            found.append(term)
    return found


def analyze_aspects(text: str, aspect_terms: list[str]) -> list[dict]:
    """Extract aspects from text and classify sentiment for each."""
    found_aspects = extract_aspect_candidates(text, aspect_terms)
    results = []

    for aspect in found_aspects:
        scores = classifier(text, text_pair=aspect)
        # scores is a list of dicts with 'label' and 'score'
        best = max(scores[0], key=lambda x: x["score"])
        results.append({
            "aspect": aspect,
            "sentiment": best["label"],
            "confidence": round(best["score"], 4),
            "all_scores": {s["label"]: round(s["score"], 4) for s in scores[0]},
        })

    return results


# Define aspects relevant to your domain
product_aspects = ["battery", "screen", "camera", "price", "shipping", "design", "performance"]

review = "The battery lasts all day but the screen is too dim and the price is unreasonable."
results = analyze_aspects(review, product_aspects)

for r in results:
    print(f"  {r['aspect']}: {r['sentiment']} ({r['confidence']})")
    print(f"    Scores: {r['all_scores']}")

Output:

1
2
3
4
5
6
  battery: Positive (0.9876)
    Scores: {'Negative': 0.0034, 'Neutral': 0.009, 'Positive': 0.9876}
  screen: Negative (0.9912)
    Scores: {'Negative': 0.9912, 'Neutral': 0.0065, 'Positive': 0.0023}
  price: Negative (0.9845)
    Scores: {'Negative': 0.9845, 'Neutral': 0.0112, 'Positive': 0.0043}

The DeBERTa model is fast and accurate. It handles multi-language inputs too – English, Chinese, French, Spanish, Arabic, Dutch, Russian, and Turkish all work with the same checkpoint.

Processing Reviews at Scale

When you’re analyzing thousands of reviews, you want batch processing and aggregate statistics. Here’s a complete pipeline that processes a list of reviews and produces per-aspect summary reports.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
from collections import defaultdict
from pyabsa import AspectTermExtraction as ATEPC


def batch_analyze_reviews(reviews: list[str], checkpoint: str = "multilingual") -> dict:
    """Process a batch of reviews and aggregate aspect-level sentiment."""
    extractor = ATEPC.AspectExtractor(checkpoint, auto_device=True)

    # Run batch prediction
    results = extractor.predict(
        reviews,
        print_result=False,
        save_result=False,
        pred_sentiment=True,
        ignore_error=True,
    )

    # Aggregate per aspect
    aspect_stats = defaultdict(lambda: {"positive": 0, "negative": 0, "neutral": 0, "total": 0})

    for result in results.get("results", results) if isinstance(results, dict) else [results]:
        # Handle both dict and list return formats
        if hasattr(result, "__iter__"):
            aspect_list = result.get("aspect", []) if isinstance(result, dict) else []
            sentiment_list = result.get("sentiment", []) if isinstance(result, dict) else []

            for aspect, sentiment in zip(aspect_list, sentiment_list):
                aspect_lower = aspect.lower().strip()
                sentiment_lower = sentiment.lower()
                aspect_stats[aspect_lower][sentiment_lower] += 1
                aspect_stats[aspect_lower]["total"] += 1

    return dict(aspect_stats)


def print_aspect_report(stats: dict) -> None:
    """Print a formatted summary of aspect sentiments."""
    print(f"{'Aspect':<20} {'Total':<8} {'Pos %':<8} {'Neg %':<8} {'Neu %':<8}")
    print("-" * 52)

    for aspect, counts in sorted(stats.items(), key=lambda x: x[1]["total"], reverse=True):
        total = counts["total"]
        pos_pct = (counts["positive"] / total) * 100
        neg_pct = (counts["negative"] / total) * 100
        neu_pct = (counts["neutral"] / total) * 100
        print(f"{aspect:<20} {total:<8} {pos_pct:<8.1f} {neg_pct:<8.1f} {neu_pct:<8.1f}")


# Example usage
reviews = [
    "The battery lasts all day and the camera takes sharp photos.",
    "Terrible battery life, but the screen looks gorgeous.",
    "Camera is decent for the price. Battery could be better.",
    "Love the screen brightness. Performance is snappy.",
    "The price is too high for this level of build quality.",
    "Battery drains fast when using the camera app.",
    "Screen resolution is perfect. Best display I have seen.",
    "Great performance and the price was fair.",
]

stats = batch_analyze_reviews(reviews)
print_aspect_report(stats)

Expected output looks like:

1
2
3
4
5
6
7
Aspect               Total    Pos %    Neg %    Neu %
----------------------------------------------------
battery              4        50.0     50.0     0.0
screen               3        100.0    0.0      0.0
camera               3        66.7     0.0      33.3
price                3        33.3     66.7     0.0
performance          2        100.0    0.0      0.0

This gives you a clear picture of what customers care about and how they feel about each specific feature. You can export this to JSON, feed it into a dashboard, or use it to prioritize product improvements.

Common Errors and Fixes

RuntimeError: CUDA out of memory when processing large batches

The default batch size can overwhelm GPU memory on large review sets. Process in chunks:

1
2
3
4
5
6
chunk_size = 32
all_results = []
for i in range(0, len(reviews), chunk_size):
    chunk = reviews[i:i + chunk_size]
    result = aspect_extractor.predict(chunk, print_result=False, pred_sentiment=True)
    all_results.append(result)

Or force CPU inference by setting auto_device=False and using device="cpu" when loading the model.

KeyError: 'aspect' when accessing PyABSA results

PyABSA’s return format varies depending on whether you pass a single string or a list. Single strings return a dict directly. Lists return a list of dicts, or sometimes a wrapper object. Always normalize the output:

1
2
3
4
5
result = aspect_extractor.predict(text, pred_sentiment=True, print_result=False)
if isinstance(result, dict):
    aspects = result.get("aspect", [])
elif isinstance(result, list):
    aspects = result[0].get("aspect", [])

text_pair argument ignored with wrong pipeline task

When using yangheng/deberta-v3-base-absa-v1.1, you must use the text-classification pipeline task. If you accidentally use sentiment-analysis (which is an alias but sometimes behaves differently with text_pair), the aspect term may get dropped. Always specify text-classification explicitly:

1
2
3
4
5
6
# Correct
classifier = pipeline("text-classification", model="yangheng/deberta-v3-base-absa-v1.1", top_k=None)
result = classifier("Great battery life", text_pair="battery")

# Wrong -- may ignore the aspect term
classifier = pipeline("sentiment-analysis", model="yangheng/deberta-v3-base-absa-v1.1")

Quick Start with PyABSA#

Building a Custom Pipeline with Transformers#

Processing Reviews at Scale#

Common Errors and Fixes#

Related Guides#

About the Author

Quick Start with PyABSA

Building a Custom Pipeline with Transformers

Processing Reviews at Scale

Common Errors and Fixes

Related Guides