How to Build a Sentiment-Aware Search Pipeline with Embeddings

Standard semantic search finds documents that are about the right topic, but it has no concept of tone. If someone searches “positive reviews about battery life,” a vanilla embedding search returns every mention of battery life regardless of whether the reviewer loved it or hated it. You can fix this by pairing FAISS-based similarity search with VADER sentiment scores, giving you a pipeline that filters on both meaning and mood.

Here’s the quick setup so you can see where this is heading:

1
pip install sentence-transformers faiss-cpu vaderSentiment numpy

Encoding Documents and Building the FAISS Index

The first step is turning your documents into dense vectors with a sentence-transformer model and loading them into a FAISS index for fast nearest-neighbor lookup. We’ll use all-MiniLM-L6-v2 because it’s small, fast, and produces solid 384-dimensional embeddings.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

documents = [
    "The battery life on this phone is incredible, easily lasts two full days.",
    "Terrible battery. Dies before lunch even with minimal use.",
    "Camera quality is sharp and vibrant, very happy with the photos.",
    "The screen cracked after a tiny drop. Worst build quality I've seen.",
    "Battery holds up well even after a year of heavy use. Impressed.",
    "Overheats constantly and the battery drains in three hours.",
    "Lightweight design and the battery is a pleasant surprise.",
    "Software is buggy and the customer support was unhelpful.",
    "Great value for the price. Battery and display both exceed expectations.",
    "Returned it after a week. Slow performance and poor battery life.",
]

embeddings = model.encode(documents, normalize_embeddings=True)
embeddings = np.array(embeddings, dtype="float32")

dimension = embeddings.shape[1]  # 384
index = faiss.IndexFlatIP(dimension)  # inner product on normalized vectors = cosine similarity
index.add(embeddings)

print(f"Indexed {index.ntotal} documents with dimension {dimension}")

IndexFlatIP computes inner product, which equals cosine similarity when vectors are L2-normalized (the normalize_embeddings=True flag handles that). For datasets under a few hundred thousand documents, flat search is fast enough. Scale beyond that and you’d swap in IndexIVFFlat or IndexHNSWFlat.

Computing Sentiment Scores with VADER

VADER (Valence Aware Dictionary and sEntiment Reasoner) works well on short, informal text like reviews and social media posts. It returns a compound score between -1 (most negative) and +1 (most positive), plus breakdowns for positive, negative, and neutral proportions.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

sentiment_scores = []
for doc in documents:
    scores = analyzer.polarity_scores(doc)
    sentiment_scores.append(scores["compound"])
    print(f"[{scores['compound']:+.3f}] {doc[:70]}...")

sentiment_scores = np.array(sentiment_scores, dtype="float32")

Running this on the sample documents gives you scores like +0.8 for the glowing battery review and -0.7 for the one about overheating. These scores become your sentiment filter layer.

One thing to know: VADER handles negation, capitalization, and punctuation-based emphasis out of the box. “The battery is NOT good” scores negative, and “AMAZING battery!!!” scores higher positive than “amazing battery.” That’s why it’s a good fit for review data without needing a fine-tuned model.

Combining Similarity and Sentiment into a Query Interface

Now the interesting part: merging both signals. The approach is straightforward. Run the FAISS search to get the top-K semantically similar documents, then filter or re-rank based on sentiment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
def sentiment_aware_search(
    query: str,
    sentiment_filter: str = "any",
    top_k: int = 5,
    sentiment_threshold: float = 0.2,
    boost_weight: float = 0.3,
) -> list[dict]:
    """
    Search documents by semantic similarity with optional sentiment filtering.

    sentiment_filter: "positive", "negative", "neutral", or "any"
    sentiment_threshold: minimum absolute compound score to count as positive/negative
    boost_weight: how much sentiment alignment boosts the final score (0 to 1)
    """
    query_embedding = model.encode([query], normalize_embeddings=True).astype("float32")

    # Retrieve more candidates than needed so filtering doesn't starve results
    fetch_k = min(top_k * 3, index.ntotal)
    similarities, indices = index.search(query_embedding, fetch_k)

    results = []
    for sim, idx in zip(similarities[0], indices[0]):
        sentiment = sentiment_scores[idx]

        # Apply sentiment filter
        if sentiment_filter == "positive" and sentiment < sentiment_threshold:
            continue
        if sentiment_filter == "negative" and sentiment > -sentiment_threshold:
            continue
        if sentiment_filter == "neutral" and abs(sentiment) > sentiment_threshold:
            continue

        # Boost score when sentiment aligns with the filter
        if sentiment_filter in ("positive", "negative"):
            sentiment_bonus = abs(sentiment) * boost_weight
            final_score = float(sim) + sentiment_bonus
        else:
            final_score = float(sim)

        results.append({
            "document": documents[idx],
            "similarity": float(sim),
            "sentiment": float(sentiment),
            "final_score": final_score,
        })

    # Sort by final score descending and return top_k
    results.sort(key=lambda x: x["final_score"], reverse=True)
    return results[:top_k]

The boost_weight parameter controls how much sentiment strength influences ranking. A value of 0.3 means a document with a perfect +1.0 sentiment score gets a 0.3 bump on top of its cosine similarity. Set it to 0 if you only want hard filtering without re-ranking.

Let’s test it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
print("=== Positive reviews about battery ===")
results = sentiment_aware_search("battery life", sentiment_filter="positive", top_k=3)
for r in results:
    print(f"  score={r['final_score']:.3f}  sentiment={r['sentiment']:+.3f}  {r['document'][:80]}")

print("\n=== Negative reviews about battery ===")
results = sentiment_aware_search("battery life", sentiment_filter="negative", top_k=3)
for r in results:
    print(f"  score={r['final_score']:.3f}  sentiment={r['sentiment']:+.3f}  {r['document'][:80]}")

print("\n=== Any sentiment about build quality ===")
results = sentiment_aware_search("build quality and durability", sentiment_filter="any", top_k=3)
for r in results:
    print(f"  score={r['final_score']:.3f}  sentiment={r['sentiment']:+.3f}  {r['document'][:80]}")

The positive battery query surfaces the “easily lasts two full days” and “pleasant surprise” reviews. The negative query picks up the “dies before lunch” and “drains in three hours” ones. Both ignore the high-similarity but wrong-sentiment matches.

Tuning the Pipeline

A few knobs you’ll want to adjust depending on your data:

Sentiment threshold (sentiment_threshold): 0.2 works as a reasonable default for VADER. Lower it to 0.05 if you want to include mildly positive/negative text. Raise it to 0.5 if you only want strongly opinionated documents.

Boost weight (boost_weight): Start at 0.3 and evaluate results manually. If highly relevant but mildly positive documents keep getting outranked by less relevant but very positive ones, reduce the weight. If sentiment-aligned results aren’t surfacing high enough, increase it.

Candidate pool size (fetch_k): The function fetches top_k * 3 candidates before filtering. For datasets where most documents have neutral sentiment, bump this to top_k * 5 or even top_k * 10 to avoid empty results after filtering.

You can also swap VADER for a transformer-based sentiment model if your text is more formal or domain-specific. VADER excels at informal English but struggles with sarcasm and domain jargon. A fine-tuned distilbert-base-uncased-finetuned-sst-2-english model gives you better accuracy at the cost of slower inference:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from transformers import pipeline

sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model="distilbert/distilbert-base-uncased-finetuned-sst-2-english",
)

result = sentiment_pipeline("The battery life on this phone is incredible")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9998}]

The tradeoff is speed. VADER processes thousands of documents per second on CPU. The transformer model needs batching and ideally a GPU to match that throughput.

Common Errors and Fixes

ValueError: could not convert string to float when building the FAISS index

This happens if you pass the raw output of model.encode() without converting to a float32 numpy array. FAISS requires float32 specifically:

1
2
3
4
5
6
# Wrong - may be float64 or a list
index.add(embeddings)

# Right
embeddings = np.array(embeddings, dtype="float32")
index.add(embeddings)

Empty results after sentiment filtering

If sentiment_aware_search returns an empty list, the candidate pool is too small relative to the sentiment distribution. Either increase fetch_k or relax sentiment_threshold. You can add a fallback:

1
2
3
4
results = sentiment_aware_search("battery", sentiment_filter="positive", top_k=5)
if not results:
    print("No sentiment-matched results. Falling back to unfiltered search.")
    results = sentiment_aware_search("battery", sentiment_filter="any", top_k=5)

ImportError: No module named 'faiss'

The pip package name is faiss-cpu, not faiss. If you have a CUDA-capable GPU, install faiss-gpu instead:

1
2
pip install faiss-cpu    # CPU-only
pip install faiss-gpu    # CUDA GPU support

VADER scores everything as neutral (compound near 0)

VADER is trained on English social media text. If your documents are in another language, contain heavy jargon, or are very short (1-2 words), the scores will cluster around zero. For non-English text, switch to a multilingual sentiment model. For domain jargon, consider fine-tuning or using a domain-specific lexicon.

FAISS search() returns -1 indices

This means FAISS couldn’t find enough neighbors, usually because top_k exceeds the number of indexed documents. Check index.ntotal and make sure your fetch_k doesn’t exceed it:

1
fetch_k = min(top_k * 3, index.ntotal)

Encoding Documents and Building the FAISS Index#

Computing Sentiment Scores with VADER#

Combining Similarity and Sentiment into a Query Interface#

Tuning the Pipeline#

Common Errors and Fixes#

Related Guides#

About the Author

Encoding Documents and Building the FAISS Index

Computing Sentiment Scores with VADER

Combining Similarity and Sentiment into a Query Interface

Tuning the Pipeline

Common Errors and Fixes

Related Guides