Voyage AI makes some of the best embedding models available through a simple API. Their code embedding model consistently tops retrieval benchmarks, and the general-purpose text models punch above their weight against much larger alternatives. You get normalized vectors, flexible dimensions, and a clean Python SDK that stays out of your way.
Here’s how to generate your first embedding:
| |
Set your API key first:
| |
Sign up at dash.voyageai.com to get a key. The free tier gives you 200 million tokens per month, which is generous enough to build and test a full search pipeline.
Choosing the Right Model
Voyage offers specialized models for different workloads:
| Model | Best For | Context | Default Dimension |
|---|---|---|---|
voyage-3 | General text search, RAG | 32K tokens | 1024 |
voyage-3-lite | High-throughput, lower cost | 32K tokens | 512 |
voyage-code-3 | Code search, code retrieval | 32K tokens | 1024 |
voyage-code-3 is the standout. It understands code structure across dozens of languages and ranks among the top code retrieval models. Use it when you’re building anything that searches over codebases, documentation with code snippets, or technical Q&A.
voyage-3 handles general text well – product descriptions, support tickets, knowledge base articles. If latency and cost matter more than marginal accuracy, voyage-3-lite gives you a smaller vector (512 dimensions) at higher throughput.
Newer models like voyage-3.5 and voyage-4 are also available if you want the latest improvements. The API works identically – just swap the model name.
Asymmetric Search with input_type
This is the detail most people miss. When you’re building retrieval (not just similarity), your queries and documents are fundamentally different. A query is short: “how to reverse a string.” A document is long: a full paragraph explaining string manipulation.
Voyage handles this with the input_type parameter:
| |
Setting input_type="document" when embedding your corpus and input_type="query" for the search query tells Voyage to apply internal prompts that improve retrieval accuracy. Skip this parameter and you’ll get decent results. Use it correctly and you’ll get noticeably better ranking.
Cosine Similarity and Batch Embedding
Voyage embeddings are normalized to unit length, so dot product equals cosine similarity. No need to normalize yourself:
| |
For large datasets, batch your requests. The API accepts up to 1,000 texts per call with a token limit of 320K for standard models. If you have more, chunk it:
| |
Building a Code Search Tool
Here’s where voyage-code-3 shines. Let’s build a simple tool that indexes Python functions and finds the most relevant one for a natural language query:
| |
The model understands that “computes Fibonacci numbers iteratively” maps to the fibonacci function even though the function uses a loop (iterative) rather than recursion. It picks up on semantic meaning, not just keyword overlap.
You can also search code with code. Pass a code snippet as the query and voyage-code-3 will find structurally similar functions – useful for finding duplicates or refactoring candidates in large codebases.
Common Errors and Fixes
voyageai.error.AuthenticationError – Your API key is missing or invalid. Make sure VOYAGE_API_KEY is set in your environment, or pass it directly: voyageai.Client(api_key="sk-...").
voyageai.error.InvalidRequestError: Total tokens exceed limit – You’re sending too many tokens in one request. Reduce your batch size. For voyage-3 the limit is 320K tokens per call, and for voyage-code-3 it’s 120K. Use the batching function above to chunk your input.
Embeddings have wrong dimensions – Some models support flexible output dimensions via the output_dimension parameter. If you’re storing embeddings in a vector database with a fixed dimension, make sure to set this consistently:
| |
Supported values are 256, 512, 1024 (default), and 2048 for models that support it.
Slow response times on large batches – The API processes tokens, not requests. A batch of 100 long documents is slower than 100 short sentences. If latency matters, keep individual texts under 512 tokens and use voyage-3-lite for faster throughput.
Mixing input_type across your pipeline – If you embed documents with input_type="document" but forget to use input_type="query" for search queries (or vice versa), retrieval quality drops. Be consistent. If you skip input_type entirely, at least skip it for both queries and documents.
Related Guides
- How to Use the Together AI API for Open-Source LLMs
- How to Use the Cohere Rerank API for Search Quality
- How to Use the AWS Bedrock Converse API for Multi-Model Chat
- How to Use the OpenAI Realtime API for Voice Applications
- How to Use the Cerebras API for Fast LLM Inference
- How to Use the OpenRouter API for Multi-Provider LLM Access
- How to Use the Anthropic Token Counting API for Cost Estimation
- How to Use the xAI Grok API for Chat and Function Calling
- How to Use the Anthropic Claude Files API for Large Document Processing
- How to Use the Anthropic Token Streaming API for Real-Time UIs