Keyword search fails when the user’s words don’t match your content. Search for “how to fix a slow computer” and keyword search misses articles titled “speed up your PC” — even though they’re about the same thing.

Semantic search converts text into vectors (embeddings) that capture meaning. Similar meanings produce similar vectors, so “fix a slow computer” and “speed up your PC” end up close together in vector space.

The Full Pipeline

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import openai
import numpy as np
from typing import list

client = openai.OpenAI()

def get_embedding(text: str) -> list[float]:
    """Convert text to a vector embedding."""
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding


def cosine_similarity(a: list[float], b: list[float]) -> float:
    """Calculate cosine similarity between two vectors."""
    a = np.array(a)
    b = np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

That’s the core — two functions. One converts text to vectors, the other measures how similar two vectors are. Everything else is plumbing.

Build the Search Index

Embed your documents once and store the vectors. For small datasets (under 100K documents), a NumPy array works fine.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Your document corpus
documents = [
    "How to speed up your PC with SSD upgrades",
    "Python list comprehensions and generator expressions",
    "Configuring nginx reverse proxy for Node.js apps",
    "Fix slow database queries with proper indexing",
    "Understanding TCP/IP networking fundamentals",
    "Getting started with Docker containers",
    "Machine learning model deployment with FastAPI",
    "Linux file permissions explained: chmod and chown",
]

# Build the index (do this once, cache the results)
print("Building index...")
doc_embeddings = []
for doc in documents:
    embedding = get_embedding(doc)
    doc_embeddings.append(embedding)

doc_embeddings = np.array(doc_embeddings)
print(f"Index built: {doc_embeddings.shape}")
# Index built: (8, 1536)

Search the Index

To search, embed the query and find the most similar documents.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def search(query: str, top_k: int = 3) -> list[dict]:
    """Search documents by semantic similarity."""
    query_embedding = get_embedding(query)
    query_vec = np.array(query_embedding)

    # Calculate similarity against all documents
    similarities = np.dot(doc_embeddings, query_vec) / (
        np.linalg.norm(doc_embeddings, axis=1) * np.linalg.norm(query_vec)
    )

    # Get top-k results
    top_indices = np.argsort(similarities)[::-1][:top_k]

    results = []
    for idx in top_indices:
        results.append({
            "document": documents[idx],
            "score": float(similarities[idx]),
        })
    return results


# Test it
results = search("my computer is running slowly")
for r in results:
    print(f"  {r['score']:.3f}  {r['document']}")

# Output:
#   0.847  How to speed up your PC with SSD upgrades
#   0.612  Fix slow database queries with proper indexing
#   0.534  Linux file permissions explained: chmod and chown

Notice “my computer is running slowly” matched “speed up your PC” — zero keyword overlap, but the meaning is the same. That’s the power of embeddings.

Save and Load the Index

Don’t re-embed every time. Save the vectors to disk.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import json

def save_index(documents: list[str], embeddings: np.ndarray, path: str):
    """Save the search index to disk."""
    np.save(f"{path}_embeddings.npy", embeddings)
    with open(f"{path}_documents.json", "w") as f:
        json.dump(documents, f)
    print(f"Index saved: {len(documents)} documents")


def load_index(path: str) -> tuple[list[str], np.ndarray]:
    """Load the search index from disk."""
    embeddings = np.load(f"{path}_embeddings.npy")
    with open(f"{path}_documents.json") as f:
        documents = json.load(f)
    print(f"Index loaded: {len(documents)} documents")
    return documents, embeddings


# Save
save_index(documents, doc_embeddings, "my_index")

# Load (fast — no API calls needed)
documents, doc_embeddings = load_index("my_index")

Scaling Beyond NumPy

For anything over 100K documents, use a proper vector database. The search function stays almost identical — you just swap the backend.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Using ChromaDB (in-memory vector DB, no server needed)
import chromadb

client_db = chromadb.Client()
collection = client_db.create_collection("docs")

# Add documents (ChromaDB handles embedding automatically)
collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))],
)

# Search
results = collection.query(
    query_texts=["my computer is running slowly"],
    n_results=3,
)
print(results["documents"][0])

ChromaDB, Pinecone, Weaviate, and Qdrant all follow this same pattern: add documents, query by text, get ranked results. Pick ChromaDB for prototyping, Pinecone or Qdrant for production.

Common Issues

Results all have similar scores. Your documents might be too short or too similar. Longer, more specific text produces more distinct embeddings.

Slow embedding generation. Batch your embeddings — send multiple texts in one API call instead of one at a time. OpenAI’s API accepts up to 2048 texts per request.

Wrong results for short queries. Single-word queries produce vague embeddings. Expand them: instead of searching for “nginx”, search for “nginx web server configuration.”

Cost and Performance

ModelDimensionsPrice per 1M tokensBest For
text-embedding-3-small1536$0.02Most use cases
text-embedding-3-large3072$0.13Maximum accuracy
text-embedding-ada-0021536$0.10Legacy compatibility

For 100K documents averaging 100 tokens each: ~$0.20 with text-embedding-3-small. That’s a one-time indexing cost — searches only embed the query (negligible cost).