Few-shot prompting works great until your task has dozens of possible input types and your static examples only cover a handful. Dynamic few-shot templates fix this by selecting the most relevant examples for each query at runtime using embedding similarity. Instead of hardcoding three examples and hoping they generalize, you embed your entire example bank and pull the closest matches for every new input.
| |
That gives you a matrix of embeddings for your full example bank. Now you can select the best examples for any new query in milliseconds.
Why Static Few-Shot Falls Apart
Static few-shot means you paste the same 2-3 examples into every prompt. This works when your inputs are homogeneous — say, all product reviews in the same style. It breaks down fast when inputs vary.
Consider a support ticket classifier. If your static examples are all about billing, the model struggles with hardware questions. You could add more examples, but prompt length balloons and you start paying for irrelevant tokens on every call.
Dynamic selection solves both problems. You maintain a large example bank (50, 500, even thousands of examples), embed them once, and for each new input, pull only the top-k most similar. The model sees relevant examples every time, and you keep token usage tight.
Building a Cosine Similarity Selector
Cosine similarity measures how close two vectors point in the same direction. It is the standard choice for comparing embeddings — fast and works well out of the box.
| |
Run this and you will see it picks the hardware-related examples — battery, screen flicker, Bluetooth — because those are semantically closest to a cracked screen complaint. A billing example about refund processing would never show up here.
A Reusable Prompt Template Class
Wrapping this into a class makes it easy to reuse across different tasks. You swap the example bank and system prompt, and the selection logic stays the same.
| |
The format_prompt method builds the full message list with dynamically selected examples. The classify method sends it to the model and returns the predicted label. You get relevant few-shot context every time without manual curation.
Using LangChain’s SemanticSimilarityExampleSelector
If you are already in the LangChain ecosystem, you do not need to build this from scratch. LangChain provides SemanticSimilarityExampleSelector backed by any vector store.
| |
This gives you the same dynamic selection behavior with less code. The trade-off is you pull in LangChain and Chroma as dependencies. For a production pipeline that already uses LangChain, this is the right call. For a lightweight script, the custom class above keeps your dependency count low.
Tuning k and Similarity Thresholds
Picking the right k matters more than you might think. Too few examples and the model lacks context. Too many and you burn tokens on marginal matches that may confuse more than help.
Start with k=3 for classification tasks. Bump to 5 if your categories are very fine-grained. For generation tasks (rewriting, translation), 2 strong examples often outperform 5 mediocre ones.
You can also add a similarity threshold to avoid pulling in low-quality matches:
| |
If the best match has a similarity of 0.15, it is probably not relevant and you are better off falling back to zero-shot or a generic prompt. A threshold of 0.3 is a reasonable starting point for text-embedding-3-small, but you should calibrate this against your own data.
Common Errors and Fixes
openai.RateLimitError when embedding large example banks — If you have thousands of examples, batch your embedding calls. The API accepts up to 2048 inputs per request, but large batches can still hit rate limits. Add a simple retry with exponential backoff, or split into chunks of 500.
Cosine similarity returns all zeros — This usually means your embeddings are all-zero vectors. Check that your API key is valid and that the embedding call returned actual data. Print the first few values of your embedding array to verify.
Selected examples seem irrelevant — Your example bank might not have coverage for that input type. Add more examples in the weak area. You can also try text-embedding-3-large for better separation between categories, though it costs more per token.
LangChain Chroma errors on import — Make sure you install the right packages. The import moved from langchain_community to langchain_chroma in recent versions. Install with pip install langchain-chroma chromadb.
Token count exceeds model context — With large k values and long examples, your prompt can balloon. Count tokens before sending with tiktoken and reduce k or truncate examples if you are close to the limit.
Related Guides
- How to Build LLM Output Validators with Instructor and Pydantic
- How to Build Structured Output Parsers with Pydantic and LLMs
- How to Build Chain-of-Thought Prompts That Actually Work
- How to Build Automatic Prompt Optimization with DSPy
- How to Build Dynamic Prompt Routers with LLM Cascading
- How to Build Prompt Guardrails with Structured Output Schemas
- How to Build Context-Aware Prompt Routing with Embeddings
- How to Build Prompt Chains with Tool Results and Structured Outputs
- How to Use Claude Sonnet 4.6’s 1M Token Context Window for Long-Document Reasoning
- How to Build Multi-Step Prompt Chains with Structured Outputs