How to Build a Hybrid Keyword and Semantic Search Pipeline

Pure keyword search misses synonyms. Pure semantic search misses exact matches. The fix is to run both and fuse the results. This approach is called hybrid search, and it consistently outperforms either method alone.

Here’s the full pipeline in one shot – BM25 for keywords, sentence-transformers for semantics, FAISS for fast vector lookup, and Reciprocal Rank Fusion (RRF) to merge the rankings.

1
pip install rank-bm25 sentence-transformers faiss-cpu numpy

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# Sample corpus -- mix of topics to show where hybrid shines
documents = [
    "PostgreSQL JSONB indexing improves query performance for nested data",
    "How to optimize SQL queries using EXPLAIN ANALYZE in Postgres",
    "Vector similarity search with FAISS for recommendation systems",
    "Setting up full-text search in Elasticsearch with custom analyzers",
    "Graph neural networks for social network link prediction",
    "Fine-tuning BERT for domain-specific text classification",
    "Kubernetes pod autoscaling based on custom metrics from Prometheus",
    "Building REST APIs with FastAPI and async database connections",
    "Training word2vec embeddings on domain-specific corpora",
    "Retrieval-augmented generation with dense passage retrieval",
    "Postgres full-text search with tsvector and GIN indexes",
    "Approximate nearest neighbor search using locality-sensitive hashing",
]

# --- BM25 Setup ---
tokenized_docs = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)

# --- Semantic Search Setup ---
model = SentenceTransformer("all-MiniLM-L6-v2")
doc_embeddings = model.encode(documents, convert_to_numpy=True, normalize_embeddings=True)

dimension = doc_embeddings.shape[1]  # 384
index = faiss.IndexFlatIP(dimension)  # Inner product = cosine similarity for normalized vectors
index.add(doc_embeddings)

That’s the full index. BM25 tokenizes and scores with term frequency. FAISS stores the dense vectors for nearest-neighbor lookup. Both are ready to query.

Querying Both Systems

Each search system returns ranked results. BM25 scores documents by term overlap. The vector index scores by embedding similarity. You need both because they catch different things.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def bm25_search(query: str, top_k: int = 10) -> list[tuple[int, float]]:
    """Return (doc_index, score) pairs ranked by BM25."""
    tokenized_query = query.lower().split()
    scores = bm25.get_scores(tokenized_query)
    top_indices = np.argsort(scores)[::-1][:top_k]
    return [(int(idx), float(scores[idx])) for idx in top_indices if scores[idx] > 0]


def semantic_search(query: str, top_k: int = 10) -> list[tuple[int, float]]:
    """Return (doc_index, score) pairs ranked by cosine similarity."""
    query_embedding = model.encode([query], convert_to_numpy=True, normalize_embeddings=True)
    scores, indices = index.search(query_embedding, top_k)
    return [(int(indices[0][i]), float(scores[0][i])) for i in range(len(indices[0]))]

Try a query where the two systems disagree. Search for “database text search” – BM25 will favor documents with those exact words, while semantic search will also surface conceptually related results about Elasticsearch and vector search that don’t contain “database” or “text search” literally.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
query = "database text search"

bm25_results = bm25_search(query, top_k=5)
semantic_results = semantic_search(query, top_k=5)

print("BM25 Results:")
for idx, score in bm25_results:
    print(f"  [{score:.2f}] {documents[idx]}")

print("\nSemantic Results:")
for idx, score in semantic_results:
    print(f"  [{score:.4f}] {documents[idx]}")

BM25 will rank the Postgres full-text search and Elasticsearch documents highest because they contain “search” and related terms. Semantic search might also surface the FAISS similarity search doc because it understands the conceptual relationship, even without keyword overlap.

Reciprocal Rank Fusion

RRF is the simplest and most effective way to merge two ranked lists. The idea: each document gets a score based on its rank position in each list, and you sum the scores. Documents that rank well in both lists float to the top.

The formula is: RRF_score = sum(1 / (k + rank)) where k is a constant (typically 60) that prevents high-ranked documents from dominating too aggressively.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def reciprocal_rank_fusion(
    ranked_lists: list[list[tuple[int, float]]],
    k: int = 60,
) -> list[tuple[int, float]]:
    """
    Fuse multiple ranked lists using RRF.

    Args:
        ranked_lists: List of ranked results, each containing (doc_index, score) tuples.
        k: Smoothing constant. Higher values reduce the gap between ranks.

    Returns:
        Fused results sorted by combined RRF score.
    """
    fused_scores: dict[int, float] = {}

    for ranked_list in ranked_lists:
        for rank, (doc_idx, _original_score) in enumerate(ranked_list):
            if doc_idx not in fused_scores:
                fused_scores[doc_idx] = 0.0
            fused_scores[doc_idx] += 1.0 / (k + rank + 1)  # rank is 0-indexed, so +1

    sorted_results = sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    return sorted_results

Now wire it all together into one function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def hybrid_search(query: str, top_k: int = 5) -> list[tuple[int, float]]:
    """Run BM25 + semantic search and fuse with RRF."""
    bm25_results = bm25_search(query, top_k=10)
    sem_results = semantic_search(query, top_k=10)
    fused = reciprocal_rank_fusion([bm25_results, sem_results], k=60)
    return fused[:top_k]


results = hybrid_search("database text search", top_k=5)
print("Hybrid Results (RRF):")
for doc_idx, rrf_score in results:
    print(f"  [{rrf_score:.4f}] {documents[doc_idx]}")

Documents that appear in both the BM25 and semantic top-10 get boosted. Documents that only one system finds still appear, but ranked lower. This is exactly the behavior you want.

Where Hybrid Beats Single-Method Search

Hybrid search wins in three specific scenarios.

Exact term matching with context. Query: “Postgres EXPLAIN ANALYZE”. BM25 nails this because those are exact terms in a document. Semantic search might rank it lower because the embedding space doesn’t distinguish between database-specific tool names as sharply. Hybrid gets the right answer because BM25 pulls it to the top and semantic doesn’t push it down.

Synonym and concept matching. Query: “neural network embeddings for text”. Pure BM25 struggles because none of the documents contain this exact phrase. Semantic search finds the word2vec and BERT documents because it understands the conceptual overlap. Hybrid surfaces these correctly.

Ambiguous queries. Query: “fast search”. BM25 matches any document with “search” in it with roughly equal scores. Semantic search understands the intent is about performance/speed and ranks FAISS and approximate nearest neighbor documents higher. Hybrid combines both signals for a better ranking.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
test_queries = [
    "Postgres EXPLAIN ANALYZE",       # Exact match -- BM25 excels
    "neural network embeddings for text",  # Conceptual -- semantic excels
    "fast search",                     # Ambiguous -- hybrid wins
]

for query in test_queries:
    print(f"\nQuery: '{query}'")
    bm25_top = bm25_search(query, top_k=3)
    sem_top = semantic_search(query, top_k=3)
    hybrid_top = hybrid_search(query, top_k=3)

    print(f"  BM25 #1:    {documents[bm25_top[0][0]] if bm25_top else 'No results'}")
    print(f"  Semantic #1: {documents[sem_top[0][0]]}")
    print(f"  Hybrid #1:  {documents[hybrid_top[0][0]]}")

Tuning the Pipeline

The k parameter in RRF controls how much rank position matters. Lower k (like 10) makes the top-ranked result from each system dominate. Higher k (like 60, the standard) smooths things out and gives more weight to documents appearing in multiple lists.

You can also weight the systems differently by scaling the RRF contributions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def weighted_rrf(
    ranked_lists: list[list[tuple[int, float]]],
    weights: list[float],
    k: int = 60,
) -> list[tuple[int, float]]:
    """RRF with per-system weights."""
    fused_scores: dict[int, float] = {}

    for weight, ranked_list in zip(weights, ranked_lists):
        for rank, (doc_idx, _score) in enumerate(ranked_list):
            if doc_idx not in fused_scores:
                fused_scores[doc_idx] = 0.0
            fused_scores[doc_idx] += weight * (1.0 / (k + rank + 1))

    return sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)


# Favor semantic search 70/30
results = weighted_rrf(
    [bm25_search("fast search", 10), semantic_search("fast search", 10)],
    weights=[0.3, 0.7],
    k=60,
)

For most use cases, equal weights with k=60 is the right starting point. Only tune weights if you have evaluation data showing one system consistently outperforms the other on your specific queries.

Common Errors and Fixes

ValueError: shapes (1,384) and (768,) not aligned when searching FAISS

You mixed embedding models. The index was built with one model dimension and you’re querying with another. Always use the same SentenceTransformer model for both indexing and querying. Check the dimension with index.d and your query embedding shape.

1
2
3
4
# Verify dimensions match
print(f"Index dimension: {index.d}")
print(f"Query dimension: {query_embedding.shape[1]}")
# Both should be 384 for all-MiniLM-L6-v2

IndexError: index out of range from BM25 with empty queries

BM25Okapi returns zero scores for all documents when the query has no matching tokens. The bm25_search function above handles this by filtering out zero-score results, but if you skip that check you’ll get empty arrays that break downstream indexing.

RuntimeError: expected a non-empty list of Tensors from sentence-transformers

You passed an empty list to model.encode(). Guard against empty inputs before encoding:

1
2
3
if not query.strip():
    raise ValueError("Query cannot be empty")
query_embedding = model.encode([query], convert_to_numpy=True, normalize_embeddings=True)

BM25 returns identical scores for all documents

This usually means your tokenization is too aggressive or too lax. The basic .split() tokenizer works for English but doesn’t handle punctuation well. Use a proper tokenizer for production:

1
2
3
4
5
6
7
8
import re

def tokenize(text: str) -> list[str]:
    """Simple tokenizer that handles punctuation and lowercasing."""
    return re.findall(r'\b\w+\b', text.lower())

tokenized_docs = [tokenize(doc) for doc in documents]
bm25 = BM25Okapi(tokenized_docs)

FAISS search returns negative cosine similarity scores

You forgot to normalize your embeddings. IndexFlatIP computes inner product, which only equals cosine similarity when vectors are unit-length. Always pass normalize_embeddings=True to model.encode(), or normalize manually with faiss.normalize_L2(embeddings).

Querying Both Systems#

Reciprocal Rank Fusion#

Where Hybrid Beats Single-Method Search#

Tuning the Pipeline#

Common Errors and Fixes#

Related Guides#

About the Author

Querying Both Systems

Reciprocal Rank Fusion

Where Hybrid Beats Single-Method Search

Tuning the Pipeline

Common Errors and Fixes

Related Guides