Most LLM apps start the same way: a prompt string with some variables jammed in. That works until you have 30 prompts, half of them broken, and nobody knows which version is in production. Prompt templates fix this. They give you reusable, testable, version-controlled prompts that work across providers.
Here’s the simplest version – a plain f-string template:
1
2
3
4
5
6
7
8
9
10
11
12
| def summarize_prompt(text: str, max_sentences: int = 3) -> str:
return f"""Summarize the following text in {max_sentences} sentences or fewer.
Be concise and preserve key facts.
Text:
{text}
Summary:"""
prompt = summarize_prompt("Python 3.12 introduced several performance improvements...", max_sentences=2)
print(prompt)
|
This is fine for simple cases. But f-strings fall apart when you need conditionals, loops, or want to enforce a schema on your template inputs. Let’s fix that.
Type-Safe Templates with Dataclasses#
Dataclasses give you validation, defaults, and IDE autocomplete for free. Wrap your prompt logic in a class and you catch bugs before they hit the API.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| from dataclasses import dataclass, field
@dataclass
class ExtractionPrompt:
document: str
fields_to_extract: list[str]
output_format: str = "json"
language: str = "English"
def render(self) -> str:
fields_list = "\n".join(f"- {f}" for f in self.fields_to_extract)
return f"""Extract the following fields from the document below.
Return the result as {self.output_format}. Respond in {self.language}.
Fields to extract:
{fields_list}
Document:
{self.document}"""
prompt = ExtractionPrompt(
document="Invoice #4821 from Acme Corp, dated 2026-01-15, total $1,250.00",
fields_to_extract=["invoice_number", "company", "date", "total"],
output_format="json",
)
print(prompt.render())
|
You can add a __post_init__ method to validate inputs – reject empty documents, cap field count, enforce allowed output formats. This beats discovering a bad prompt from a 400 error at 3 AM.
Adding Validation#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| from dataclasses import dataclass
ALLOWED_FORMATS = {"json", "yaml", "markdown", "csv"}
@dataclass
class ValidatedPrompt:
document: str
output_format: str = "json"
def __post_init__(self):
if not self.document.strip():
raise ValueError("Document cannot be empty")
if self.output_format not in ALLOWED_FORMATS:
raise ValueError(f"output_format must be one of {ALLOWED_FORMATS}, got '{self.output_format}'")
def render(self) -> str:
return f"Extract key entities from this document. Return as {self.output_format}.\n\n{self.document}"
|
Now bad inputs fail fast with a clear error instead of producing garbage outputs.
Jinja2 for Complex Conditional Templates#
When your prompts need conditionals, loops over dynamic data, or optional sections, Jinja2 is the right tool. It separates template logic from Python code, which makes templates easier to review and version.
Install it first:
Here’s a real example – a classification prompt that adapts based on whether you provide examples and custom instructions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| from jinja2 import Template
CLASSIFICATION_TEMPLATE = Template("""Classify the following text into one of these categories: {{ categories | join(', ') }}.
{% if examples %}
Here are some examples:
{% for ex in examples %}
Text: "{{ ex.text }}"
Category: {{ ex.label }}
{% endfor %}
{% endif %}
{% if instructions %}
Additional instructions: {{ instructions }}
{% endif %}
Text to classify: "{{ text }}"
Category:""")
prompt = CLASSIFICATION_TEMPLATE.render(
categories=["positive", "negative", "neutral"],
text="The product arrived on time and works great.",
examples=[
{"text": "Terrible experience, never again.", "label": "negative"},
{"text": "It was okay, nothing special.", "label": "neutral"},
],
instructions=None,
)
print(prompt)
|
Jinja2 handles the None check for you – the {% if instructions %} block just disappears when there’s nothing to show. No awkward empty lines or “None” strings leaking into your prompt.
Loading Templates from Files#
For production setups, store templates as files and load them at startup:
1
2
3
4
5
6
7
8
9
10
11
| from jinja2 import Environment, FileSystemLoader
env = Environment(
loader=FileSystemLoader("prompts/"),
keep_trailing_newline=True,
trim_blocks=True,
lstrip_blocks=True,
)
template = env.get_template("classification.j2")
prompt = template.render(categories=["spam", "ham"], text="Buy now! Limited offer!")
|
This pattern lets non-engineers edit prompt templates without touching Python code. Store the .j2 files in version control alongside your app.
Chat Markup – Building Message Arrays#
Modern LLM APIs don’t take a single string. They take an array of messages with roles. Building these programmatically is where most template libraries fall short, so you’ll want a thin helper.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
| from dataclasses import dataclass, field
@dataclass
class ChatTemplate:
system: str
user_template: str
examples: list[dict[str, str]] = field(default_factory=list)
def render(self, **kwargs) -> list[dict[str, str]]:
messages = [{"role": "system", "content": self.system}]
for ex in self.examples:
messages.append({"role": "user", "content": ex["user"]})
messages.append({"role": "assistant", "content": ex["assistant"]})
messages.append({"role": "user", "content": self.user_template.format(**kwargs)})
return messages
sql_template = ChatTemplate(
system="You are a SQL expert. Convert natural language queries to PostgreSQL. Return only the SQL query, no explanation.",
user_template="Convert this to SQL: {query}\n\nTable schema:\n{schema}",
examples=[
{
"user": "Convert this to SQL: How many users signed up last month?\n\nTable schema:\nusers(id, email, created_at)",
"assistant": "SELECT COUNT(*) FROM users WHERE created_at >= date_trunc('month', CURRENT_DATE - INTERVAL '1 month') AND created_at < date_trunc('month', CURRENT_DATE);",
}
],
)
messages = sql_template.render(
query="Find the top 5 customers by total order value",
schema="customers(id, name, email)\norders(id, customer_id, total, created_at)",
)
|
Sending to OpenAI and Anthropic#
The same template data works with both providers – you just map the format slightly:
1
2
3
4
5
6
7
8
9
10
| from openai import OpenAI
openai_client = OpenAI()
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0,
)
print(response.choices[0].message.content)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| from anthropic import Anthropic
anthropic_client = Anthropic()
# Anthropic uses a separate system parameter
system_msg = messages[0]["content"]
non_system = messages[1:]
response = anthropic_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=system_msg,
messages=non_system,
)
print(response.content[0].text)
|
Notice the difference: OpenAI includes the system message in the messages array. Anthropic takes it as a separate system parameter. Your template class should handle both – add a render_for_anthropic() method that splits the system message out if you want to keep things clean.
Prompt Versioning and Testing#
Treat prompts like code. Version them, test them, track changes.
Versioning with a Registry#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
| from dataclasses import dataclass
from datetime import date
@dataclass
class PromptVersion:
name: str
version: str
template: str
created: date
changelog: str
PROMPT_REGISTRY: dict[str, PromptVersion] = {}
def register_prompt(name: str, version: str, template: str, changelog: str = "") -> None:
key = f"{name}::{version}"
PROMPT_REGISTRY[key] = PromptVersion(
name=name,
version=version,
template=template,
created=date.today(),
changelog=changelog,
)
def get_prompt(name: str, version: str = "latest") -> PromptVersion:
if version == "latest":
matches = [v for k, v in PROMPT_REGISTRY.items() if k.startswith(f"{name}::")]
if not matches:
raise KeyError(f"No prompt registered with name '{name}'")
return sorted(matches, key=lambda p: p.version)[-1]
key = f"{name}::{version}"
if key not in PROMPT_REGISTRY:
raise KeyError(f"Prompt '{key}' not found in registry")
return PROMPT_REGISTRY[key]
register_prompt(
name="summarize",
version="1.0",
template="Summarize in {n} sentences:\n\n{text}",
changelog="Initial version",
)
register_prompt(
name="summarize",
version="1.1",
template="Summarize the text below in exactly {n} sentences. Preserve all proper nouns and numbers.\n\n{text}",
changelog="Added instruction to preserve proper nouns and numbers",
)
current = get_prompt("summarize", "latest")
print(f"Using {current.name} v{current.version}: {current.changelog}")
|
Testing Prompts with Assertions#
Run prompt tests as part of your CI pipeline. Check that rendered output contains expected fragments, stays under token limits, and produces valid message arrays:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| import unittest
class TestSqlTemplate(unittest.TestCase):
def setUp(self):
self.template = ChatTemplate(
system="You are a SQL expert.",
user_template="Convert to SQL: {query}\nSchema: {schema}",
)
def test_renders_system_message(self):
messages = self.template.render(query="count users", schema="users(id)")
assert messages[0]["role"] == "system"
assert "SQL expert" in messages[0]["content"]
def test_renders_user_message_with_variables(self):
messages = self.template.render(query="count users", schema="users(id)")
user_msg = messages[-1]["content"]
assert "count users" in user_msg
assert "users(id)" in user_msg
def test_missing_variable_raises(self):
with self.assertRaises(KeyError):
self.template.render(query="count users") # missing schema
if __name__ == "__main__":
unittest.main()
|
This catches regressions when someone edits a template and accidentally removes a variable placeholder.
Common Errors and Fixes#
KeyError: 'name' when using f-strings with JSON in templates
F-strings interpret curly braces as expressions. If your prompt contains literal JSON, the braces collide.
1
2
3
4
5
6
7
8
9
| # This breaks:
prompt = f"Return JSON like {{'name': 'value'}}"
# Fix: double the braces to escape them
prompt = f"Return JSON like {{\"name\": \"value\"}}"
# Or better: use a Jinja2 template instead
from jinja2 import Template
prompt = Template('Return JSON like {"name": "value"}').render()
|
TypeError: 'NoneType' object is not iterable in Jinja2 loops
This happens when a list variable is None instead of an empty list.
1
2
3
4
5
6
7
8
| # Breaks when examples is None:
# {% for ex in examples %}
# Fix: provide a default in the render call
prompt = template.render(examples=examples or [])
# Or use Jinja2's default filter:
# {% for ex in examples | default([]) %}
|
anthropic.BadRequestError: messages: first message must use "user" role
You accidentally included the system message in the messages array for Anthropic.
1
2
3
4
5
6
7
| # Wrong:
client.messages.create(model="claude-sonnet-4-20250514", max_tokens=1024, messages=all_messages)
# Right: split out the system message
system = all_messages[0]["content"]
messages = all_messages[1:]
client.messages.create(model="claude-sonnet-4-20250514", max_tokens=1024, system=system, messages=messages)
|
openai.BadRequestError: 'messages' must contain at least one message
Your template rendered an empty message array. Add a guard:
1
2
3
| messages = template.render(**kwargs)
if not messages:
raise ValueError("Template rendered an empty message list -- check your template inputs")
|
These patterns scale from weekend projects to production systems serving millions of requests. Start with f-strings, graduate to dataclasses when you need validation, and reach for Jinja2 when your templates get conditional logic. The key is treating prompts as first-class code artifacts – versioned, tested, and reviewed like everything else.