The Quick Setup
Install the Pinecone SDK and an embedding provider. You need both – Pinecone stores and queries vectors, but you generate them yourself.
| |
Create a serverless index and upsert your first vectors:
| |
That’s the core loop. Everything else is about doing it well at scale.
Generating Embeddings
You have two main options: OpenAI’s API or local models with Sentence Transformers. OpenAI gives you better quality on general text. Sentence Transformers is free and runs locally.
OpenAI Embeddings
| |
text-embedding-3-small is the sweet spot. It’s cheap ($0.02 per million tokens), fast, and the quality gap vs. text-embedding-3-large is smaller than you’d expect for most retrieval tasks.
Sentence Transformers (Local)
| |
Pick all-MiniLM-L6-v2 for speed (384 dims), or all-mpnet-base-v2 for better accuracy (768 dims). Match your index dimension to whichever model you choose.
Serverless vs. Pod-Based Indexes
Serverless is what you want for most new projects. It auto-scales, you pay per query, and there’s no idle cost eating your budget.
| |
Pod-based indexes make sense when you have steady, high-throughput workloads where you want guaranteed latency. For everything else, serverless saves money.
Building a Semantic Search Pipeline
Here’s a real pipeline that embeds documents, upserts them in batches, and queries with metadata filtering.
| |
The metadata filter narrows results to the training category before ranking by similarity. This is faster than filtering after retrieval and gives you more relevant results.
Namespace Management
Namespaces partition your index without creating separate indexes. Use them to isolate tenants, environments, or data types.
| |
One index with namespaces beats multiple indexes for multi-tenant apps. You pay for one index but get logical separation. Each namespace has its own vector space, so queries in one namespace never see vectors from another.
Index Management
| |
Hybrid Search with Sparse-Dense Vectors
Pure semantic search misses exact keyword matches. Hybrid search combines dense embeddings (semantic meaning) with sparse vectors (keyword matching) for better retrieval.
| |
Use dotproduct as the metric for hybrid indexes. Cosine similarity doesn’t work correctly when combining sparse and dense scores.
Cost Optimization Tips
Pinecone billing adds up fast if you’re not paying attention. A few things that actually matter:
Choose smaller embeddings. text-embedding-3-small at 1536 dims costs less to store and query than text-embedding-3-large at 3072 dims. The retrieval quality difference is marginal for most use cases.
Batch your upserts. Single-vector upserts are wasteful. Always batch at 100 vectors per request. The API accepts up to 1000, but 100 is the sweet spot for reliability.
Use metadata filtering aggressively. Filters reduce the search space before the vector comparison happens. A query over 100k vectors with a filter that narrows to 10k is significantly cheaper on serverless.
Delete what you don’t need. Serverless indexes charge per vector stored. If you’re replacing embeddings (re-indexing after a model change), delete the old namespace first.
Pick serverless for bursty workloads. If your traffic is uneven – heavy during business hours, quiet at night – serverless saves 60-80% compared to always-on pods.
Common Errors
PineconeApiException: dimension mismatch – Your vectors don’t match the index dimension. If the index is 1536d, every vector must have exactly 1536 floats. Check which embedding model you’re using. all-MiniLM-L6-v2 outputs 384d, not 1536d.
PineconeApiException: index not found – The index might still be initializing. After create_index, wait for it to be ready:
| |
429 Too Many Requests – You’re hitting rate limits. Add exponential backoff to your upsert loop, or reduce batch size. Serverless indexes have lower burst limits than pod-based ones.
metadata size exceeds limit – Pinecone caps metadata at 40KB per vector. Don’t store full document text in metadata. Store a reference ID and fetch the text from your own database.
Slow queries returning irrelevant results – Your embedding model and query model are probably different. Always use the same model for embedding documents and queries. Mixing text-embedding-ada-002 documents with text-embedding-3-small queries gives garbage results.
Related Guides
- How to Build a Data Contamination Detection Pipeline for LLM Training
- How to Build a Dataset Changelog and Diff Pipeline with Python
- How to Build a Data Augmentation Pipeline for Tabular Data
- How to Build a Dataset Versioning Pipeline with LakeFS
- How to Build a Feature Engineering Pipeline with Featuretools
- How to Build a Data Schema Evolution Pipeline for ML Datasets
- How to Build a Data Slicing and Stratification Pipeline for ML
- How to Build a Data Annotation Pipeline with Argilla
- How to Build a Data Validation Pipeline with Pydantic and Pandera
- How to Build a Data Outlier Detection Pipeline with PyOD