How to Build Few-Shot Prompt Templates with Dynamic Examples

Few-shot prompting works great until your task has dozens of possible input types and your static examples only cover a handful. Dynamic few-shot templates fix this by selecting the most relevant examples for each query at runtime using embedding similarity. Instead of hardcoding three examples and hoping they generalize, you embed your entire example bank and pull the closest matches for every new input.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from openai import OpenAI
import numpy as np

client = OpenAI()

# Your example bank — as many as you want
examples = [
    {"input": "The battery dies after 2 hours", "label": "hardware"},
    {"input": "App crashes when I open settings", "label": "software"},
    {"input": "I was charged twice for my subscription", "label": "billing"},
    {"input": "The screen flickers on low brightness", "label": "hardware"},
    {"input": "Cannot reset my password via email", "label": "account"},
    {"input": "Refund not processed after 30 days", "label": "billing"},
    {"input": "Bluetooth won't pair with any device", "label": "hardware"},
    {"input": "The app freezes during video calls", "label": "software"},
]

# Embed all examples once
def get_embeddings(texts: list[str]) -> np.ndarray:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts,
    )
    return np.array([item.embedding for item in response.data])

example_texts = [ex["input"] for ex in examples]
example_embeddings = get_embeddings(example_texts)

That gives you a matrix of embeddings for your full example bank. Now you can select the best examples for any new query in milliseconds.

Why Static Few-Shot Falls Apart

Static few-shot means you paste the same 2-3 examples into every prompt. This works when your inputs are homogeneous — say, all product reviews in the same style. It breaks down fast when inputs vary.

Consider a support ticket classifier. If your static examples are all about billing, the model struggles with hardware questions. You could add more examples, but prompt length balloons and you start paying for irrelevant tokens on every call.

Dynamic selection solves both problems. You maintain a large example bank (50, 500, even thousands of examples), embed them once, and for each new input, pull only the top-k most similar. The model sees relevant examples every time, and you keep token usage tight.

Building a Cosine Similarity Selector

Cosine similarity measures how close two vectors point in the same direction. It is the standard choice for comparing embeddings — fast and works well out of the box.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from sklearn.metrics.pairwise import cosine_similarity

def select_examples(query: str, k: int = 3) -> list[dict]:
    """Select top-k most similar examples for a given query."""
    query_embedding = get_embeddings([query])
    similarities = cosine_similarity(query_embedding, example_embeddings)[0]
    top_indices = np.argsort(similarities)[-k:][::-1]
    return [examples[i] for i in top_indices]

# Test it
selected = select_examples("My phone screen is cracked")
for ex in selected:
    print(f"  [{ex['label']}] {ex['input']}")

Run this and you will see it picks the hardware-related examples — battery, screen flicker, Bluetooth — because those are semantically closest to a cracked screen complaint. A billing example about refund processing would never show up here.

A Reusable Prompt Template Class

Wrapping this into a class makes it easy to reuse across different tasks. You swap the example bank and system prompt, and the selection logic stays the same.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
class DynamicFewShotTemplate:
    def __init__(self, examples: list[dict], system_prompt: str, k: int = 3):
        self.examples = examples
        self.system_prompt = system_prompt
        self.k = k
        self.example_texts = [ex["input"] for ex in examples]
        self.embeddings = get_embeddings(self.example_texts)

    def _select(self, query: str) -> list[dict]:
        query_emb = get_embeddings([query])
        sims = cosine_similarity(query_emb, self.embeddings)[0]
        top_idx = np.argsort(sims)[-self.k:][::-1]
        return [self.examples[i] for i in top_idx]

    def format_prompt(self, query: str) -> list[dict]:
        selected = self._select(query)
        few_shot_block = "\n".join(
            f"Input: {ex['input']}\nCategory: {ex['label']}" for ex in selected
        )
        return [
            {"role": "system", "content": self.system_prompt},
            {
                "role": "user",
                "content": (
                    f"Classify the following support ticket. "
                    f"Here are some examples:\n\n{few_shot_block}\n\n"
                    f"Input: {query}\nCategory:"
                ),
            },
        ]

    def classify(self, query: str) -> str:
        messages = self.format_prompt(query)
        response = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            max_tokens=20,
            temperature=0,
        )
        return response.choices[0].message.content.strip()


template = DynamicFewShotTemplate(
    examples=examples,
    system_prompt="You are a support ticket classifier. Respond with only the category name.",
    k=3,
)

result = template.classify("I can't connect my headphones via USB-C")
print(result)  # hardware

The format_prompt method builds the full message list with dynamically selected examples. The classify method sends it to the model and returns the predicted label. You get relevant few-shot context every time without manual curation.

Using LangChain’s SemanticSimilarityExampleSelector

If you are already in the LangChain ecosystem, you do not need to build this from scratch. LangChain provides SemanticSimilarityExampleSelector backed by any vector store.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

examples_lc = [
    {"input": "The battery dies after 2 hours", "output": "hardware"},
    {"input": "App crashes when I open settings", "output": "software"},
    {"input": "I was charged twice for my subscription", "output": "billing"},
    {"input": "Cannot reset my password via email", "output": "account"},
    {"input": "The screen flickers on low brightness", "output": "hardware"},
    {"input": "Refund not processed after 30 days", "output": "billing"},
]

example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples_lc,
    OpenAIEmbeddings(model="text-embedding-3-small"),
    Chroma,
    k=3,
)

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nCategory: {output}",
)

few_shot_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Classify the support ticket into a category.",
    suffix="Input: {query}\nCategory:",
    input_variables=["query"],
)

formatted = few_shot_prompt.format(query="My monitor keeps going black")
print(formatted)

This gives you the same dynamic selection behavior with less code. The trade-off is you pull in LangChain and Chroma as dependencies. For a production pipeline that already uses LangChain, this is the right call. For a lightweight script, the custom class above keeps your dependency count low.

Tuning k and Similarity Thresholds

Picking the right k matters more than you might think. Too few examples and the model lacks context. Too many and you burn tokens on marginal matches that may confuse more than help.

Start with k=3 for classification tasks. Bump to 5 if your categories are very fine-grained. For generation tasks (rewriting, translation), 2 strong examples often outperform 5 mediocre ones.

You can also add a similarity threshold to avoid pulling in low-quality matches:

1
2
3
4
5
6
7
8
9
def select_examples_with_threshold(
    query: str, k: int = 3, threshold: float = 0.3
) -> list[dict]:
    query_emb = get_embeddings([query])
    sims = cosine_similarity(query_emb, example_embeddings)[0]
    # Filter by threshold first, then take top-k
    valid = [(i, s) for i, s in enumerate(sims) if s >= threshold]
    valid.sort(key=lambda x: x[1], reverse=True)
    return [examples[i] for i, _ in valid[:k]]

If the best match has a similarity of 0.15, it is probably not relevant and you are better off falling back to zero-shot or a generic prompt. A threshold of 0.3 is a reasonable starting point for text-embedding-3-small, but you should calibrate this against your own data.

Common Errors and Fixes

openai.RateLimitError when embedding large example banks — If you have thousands of examples, batch your embedding calls. The API accepts up to 2048 inputs per request, but large batches can still hit rate limits. Add a simple retry with exponential backoff, or split into chunks of 500.

Cosine similarity returns all zeros — This usually means your embeddings are all-zero vectors. Check that your API key is valid and that the embedding call returned actual data. Print the first few values of your embedding array to verify.

Selected examples seem irrelevant — Your example bank might not have coverage for that input type. Add more examples in the weak area. You can also try text-embedding-3-large for better separation between categories, though it costs more per token.

LangChain Chroma errors on import — Make sure you install the right packages. The import moved from langchain_community to langchain_chroma in recent versions. Install with pip install langchain-chroma chromadb.

Token count exceeds model context — With large k values and long examples, your prompt can balloon. Count tokens before sending with tiktoken and reduce k or truncate examples if you are close to the limit.

Why Static Few-Shot Falls Apart#

Building a Cosine Similarity Selector#

A Reusable Prompt Template Class#

Using LangChain’s SemanticSimilarityExampleSelector#

Tuning k and Similarity Thresholds#

Common Errors and Fixes#

Related Guides#

About the Author