Voyage AI makes some of the best embedding models available through a simple API. Their code embedding model consistently tops retrieval benchmarks, and the general-purpose text models punch above their weight against much larger alternatives. You get normalized vectors, flexible dimensions, and a clean Python SDK that stays out of your way.

Here’s how to generate your first embedding:

1
2
3
4
5
6
7
import voyageai

vo = voyageai.Client()  # reads VOYAGE_API_KEY from environment

result = vo.embed(["How do I sort a list in Python?"], model="voyage-3", input_type="query")
print(len(result.embeddings[0]))  # 1024
print(result.total_tokens)        # token count for billing

Set your API key first:

1
2
pip install voyageai
export VOYAGE_API_KEY="your-api-key"

Sign up at dash.voyageai.com to get a key. The free tier gives you 200 million tokens per month, which is generous enough to build and test a full search pipeline.

Choosing the Right Model

Voyage offers specialized models for different workloads:

ModelBest ForContextDefault Dimension
voyage-3General text search, RAG32K tokens1024
voyage-3-liteHigh-throughput, lower cost32K tokens512
voyage-code-3Code search, code retrieval32K tokens1024

voyage-code-3 is the standout. It understands code structure across dozens of languages and ranks among the top code retrieval models. Use it when you’re building anything that searches over codebases, documentation with code snippets, or technical Q&A.

voyage-3 handles general text well – product descriptions, support tickets, knowledge base articles. If latency and cost matter more than marginal accuracy, voyage-3-lite gives you a smaller vector (512 dimensions) at higher throughput.

Newer models like voyage-3.5 and voyage-4 are also available if you want the latest improvements. The API works identically – just swap the model name.

Asymmetric Search with input_type

This is the detail most people miss. When you’re building retrieval (not just similarity), your queries and documents are fundamentally different. A query is short: “how to reverse a string.” A document is long: a full paragraph explaining string manipulation.

Voyage handles this with the input_type parameter:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import voyageai

vo = voyageai.Client()

# Embed your corpus as documents
documents = [
    "To reverse a string in Python, use slicing: my_string[::-1]. This creates a new string with characters in reverse order.",
    "Python lists can be sorted in-place with list.sort() or return a new sorted list with sorted().",
    "Dictionary comprehensions let you build dicts from iterables: {k: v for k, v in pairs}.",
]

doc_result = vo.embed(documents, model="voyage-3", input_type="document")
doc_embeddings = doc_result.embeddings

# Embed the user's search query
query = "how do I reverse a string?"
query_result = vo.embed([query], model="voyage-3", input_type="query")
query_embedding = query_result.embeddings[0]

Setting input_type="document" when embedding your corpus and input_type="query" for the search query tells Voyage to apply internal prompts that improve retrieval accuracy. Skip this parameter and you’ll get decent results. Use it correctly and you’ll get noticeably better ranking.

Cosine Similarity and Batch Embedding

Voyage embeddings are normalized to unit length, so dot product equals cosine similarity. No need to normalize yourself:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import numpy as np
import voyageai

vo = voyageai.Client()

documents = [
    "FastAPI is an async Python web framework for building APIs.",
    "PyTorch provides tensor computation with GPU acceleration for deep learning.",
    "Redis is an in-memory data store used for caching and message brokering.",
    "SQLAlchemy is a Python SQL toolkit and ORM for database access.",
    "Celery is a distributed task queue for processing background jobs.",
]

# Embed all documents
doc_result = vo.embed(documents, model="voyage-3", input_type="document")

# Embed a query
query_result = vo.embed(["async web framework for Python"], model="voyage-3", input_type="query")

# Compute similarities using dot product (equivalent to cosine for normalized vectors)
doc_matrix = np.array(doc_result.embeddings)
query_vec = np.array(query_result.embeddings[0])

similarities = np.dot(doc_matrix, query_vec)

# Rank results
ranked = sorted(enumerate(similarities), key=lambda x: x[1], reverse=True)
for idx, score in ranked:
    print(f"{score:.4f}  {documents[idx][:80]}")

For large datasets, batch your requests. The API accepts up to 1,000 texts per call with a token limit of 320K for standard models. If you have more, chunk it:

1
2
3
4
5
6
7
8
def embed_in_batches(texts, model="voyage-3", input_type="document", batch_size=128):
    vo = voyageai.Client()
    all_embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i : i + batch_size]
        result = vo.embed(batch, model=model, input_type=input_type)
        all_embeddings.extend(result.embeddings)
    return all_embeddings

Building a Code Search Tool

Here’s where voyage-code-3 shines. Let’s build a simple tool that indexes Python functions and finds the most relevant one for a natural language query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
import numpy as np
import voyageai

vo = voyageai.Client()

# A small codebase of Python functions
code_snippets = [
    '''def fibonacci(n):
    """Return the nth Fibonacci number."""
    if n <= 1:
        return n
    a, b = 0, 1
    for _ in range(2, n + 1):
        a, b = b, a + b
    return b''',

    '''def merge_sort(arr):
    """Sort a list using merge sort algorithm."""
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)''',

    '''def flatten(nested_list):
    """Flatten a nested list into a single list."""
    result = []
    for item in nested_list:
        if isinstance(item, list):
            result.extend(flatten(item))
        else:
            result.append(item)
    return result''',

    '''def lru_cache_manual(maxsize=128):
    """A simple LRU cache decorator using OrderedDict."""
    from collections import OrderedDict
    def decorator(func):
        cache = OrderedDict()
        def wrapper(*args):
            if args in cache:
                cache.move_to_end(args)
                return cache[args]
            result = func(*args)
            cache[args] = result
            if len(cache) > maxsize:
                cache.popitem(last=False)
            return result
        return wrapper
    return decorator''',
]

# Embed the code with the code model
code_embeddings = vo.embed(
    code_snippets, model="voyage-code-3", input_type="document"
).embeddings

# Search with a natural language query
query = "function that computes Fibonacci numbers iteratively"
query_embedding = vo.embed(
    [query], model="voyage-code-3", input_type="query"
).embeddings[0]

# Find the best match
similarities = np.dot(np.array(code_embeddings), np.array(query_embedding))
best_idx = np.argmax(similarities)

print(f"Score: {similarities[best_idx]:.4f}")
print(f"Match:\n{code_snippets[best_idx]}")

The model understands that “computes Fibonacci numbers iteratively” maps to the fibonacci function even though the function uses a loop (iterative) rather than recursion. It picks up on semantic meaning, not just keyword overlap.

You can also search code with code. Pass a code snippet as the query and voyage-code-3 will find structurally similar functions – useful for finding duplicates or refactoring candidates in large codebases.

Common Errors and Fixes

voyageai.error.AuthenticationError – Your API key is missing or invalid. Make sure VOYAGE_API_KEY is set in your environment, or pass it directly: voyageai.Client(api_key="sk-...").

voyageai.error.InvalidRequestError: Total tokens exceed limit – You’re sending too many tokens in one request. Reduce your batch size. For voyage-3 the limit is 320K tokens per call, and for voyage-code-3 it’s 120K. Use the batching function above to chunk your input.

Embeddings have wrong dimensions – Some models support flexible output dimensions via the output_dimension parameter. If you’re storing embeddings in a vector database with a fixed dimension, make sure to set this consistently:

1
result = vo.embed(texts, model="voyage-code-3", output_dimension=512)

Supported values are 256, 512, 1024 (default), and 2048 for models that support it.

Slow response times on large batches – The API processes tokens, not requests. A batch of 100 long documents is slower than 100 short sentences. If latency matters, keep individual texts under 512 tokens and use voyage-3-lite for faster throughput.

Mixing input_type across your pipeline – If you embed documents with input_type="document" but forget to use input_type="query" for search queries (or vice versa), retrieval quality drops. Be consistent. If you skip input_type entirely, at least skip it for both queries and documents.