Why Semantic Search Beats Keyword Search#
Keyword search fails when the user’s words don’t match your content. Search for “how to fix a slow computer” and keyword search misses articles titled “speed up your PC” — even though they’re about the same thing.
Semantic search converts text into vectors (embeddings) that capture meaning. Similar meanings produce similar vectors, so “fix a slow computer” and “speed up your PC” end up close together in vector space.
The Full Pipeline#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| import openai
import numpy as np
from typing import list
client = openai.OpenAI()
def get_embedding(text: str) -> list[float]:
"""Convert text to a vector embedding."""
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def cosine_similarity(a: list[float], b: list[float]) -> float:
"""Calculate cosine similarity between two vectors."""
a = np.array(a)
b = np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
|
That’s the core — two functions. One converts text to vectors, the other measures how similar two vectors are. Everything else is plumbing.
Build the Search Index#
Embed your documents once and store the vectors. For small datasets (under 100K documents), a NumPy array works fine.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| # Your document corpus
documents = [
"How to speed up your PC with SSD upgrades",
"Python list comprehensions and generator expressions",
"Configuring nginx reverse proxy for Node.js apps",
"Fix slow database queries with proper indexing",
"Understanding TCP/IP networking fundamentals",
"Getting started with Docker containers",
"Machine learning model deployment with FastAPI",
"Linux file permissions explained: chmod and chown",
]
# Build the index (do this once, cache the results)
print("Building index...")
doc_embeddings = []
for doc in documents:
embedding = get_embedding(doc)
doc_embeddings.append(embedding)
doc_embeddings = np.array(doc_embeddings)
print(f"Index built: {doc_embeddings.shape}")
# Index built: (8, 1536)
|
Search the Index#
To search, embed the query and find the most similar documents.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| def search(query: str, top_k: int = 3) -> list[dict]:
"""Search documents by semantic similarity."""
query_embedding = get_embedding(query)
query_vec = np.array(query_embedding)
# Calculate similarity against all documents
similarities = np.dot(doc_embeddings, query_vec) / (
np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_vec)
)
# Get top-k results
top_indices = np.argsort(similarities)[::-1][:top_k]
results = []
for idx in top_indices:
results.append({
"document": documents[idx],
"score": float(similarities[idx]),
})
return results
# Test it
results = search("my computer is running slowly")
for r in results:
print(f" {r['score']:.3f} {r['document']}")
# Output:
# 0.847 How to speed up your PC with SSD upgrades
# 0.612 Fix slow database queries with proper indexing
# 0.534 Linux file permissions explained: chmod and chown
|
Notice “my computer is running slowly” matched “speed up your PC” — zero keyword overlap, but the meaning is the same. That’s the power of embeddings.
Save and Load the Index#
Don’t re-embed every time. Save the vectors to disk.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| import json
def save_index(documents: list[str], embeddings: np.ndarray, path: str):
"""Save the search index to disk."""
np.save(f"{path}_embeddings.npy", embeddings)
with open(f"{path}_documents.json", "w") as f:
json.dump(documents, f)
print(f"Index saved: {len(documents)} documents")
def load_index(path: str) -> tuple[list[str], np.ndarray]:
"""Load the search index from disk."""
embeddings = np.load(f"{path}_embeddings.npy")
with open(f"{path}_documents.json") as f:
documents = json.load(f)
print(f"Index loaded: {len(documents)} documents")
return documents, embeddings
# Save
save_index(documents, doc_embeddings, "my_index")
# Load (fast — no API calls needed)
documents, doc_embeddings = load_index("my_index")
|
Scaling Beyond NumPy#
For anything over 100K documents, use a proper vector database. The search function stays almost identical — you just swap the backend.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| # Using ChromaDB (in-memory vector DB, no server needed)
import chromadb
client_db = chromadb.Client()
collection = client_db.create_collection("docs")
# Add documents (ChromaDB handles embedding automatically)
collection.add(
documents=documents,
ids=[f"doc_{i}" for i in range(len(documents))],
)
# Search
results = collection.query(
query_texts=["my computer is running slowly"],
n_results=3,
)
print(results["documents"][0])
|
ChromaDB, Pinecone, Weaviate, and Qdrant all follow this same pattern: add documents, query by text, get ranked results. Pick ChromaDB for prototyping, Pinecone or Qdrant for production.
Common Issues#
Results all have similar scores. Your documents might be too short or too similar. Longer, more specific text produces more distinct embeddings.
Slow embedding generation. Batch your embeddings — send multiple texts in one API call instead of one at a time. OpenAI’s API accepts up to 2048 texts per request.
Wrong results for short queries. Single-word queries produce vague embeddings. Expand them: instead of searching for “nginx”, search for “nginx web server configuration.”
| Model | Dimensions | Price per 1M tokens | Best For |
|---|
| text-embedding-3-small | 1536 | $0.02 | Most use cases |
| text-embedding-3-large | 3072 | $0.13 | Maximum accuracy |
| text-embedding-ada-002 | 1536 | $0.10 | Legacy compatibility |
For 100K documents averaging 100 tokens each: ~$0.20 with text-embedding-3-small. That’s a one-time indexing cost — searches only embed the query (negligible cost).