How to Write Effective System Prompts for LLMs

The Problem with Vague Prompts

Most developers throw a one-liner at an LLM and wonder why the output is inconsistent. The system prompt is your control plane — it sets the model’s behavior, tone, constraints, and output format before the user ever says a word.

A well-crafted system prompt is the difference between a chatbot that rambles and one that delivers precise, structured answers every time.

The Core Framework

Every effective system prompt has four parts: Role, Rules, Format, and Examples.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
system_prompt = """
You are a senior Python developer who reviews code for production readiness.

## Rules
- Focus on security vulnerabilities, performance issues, and maintainability
- Be direct and specific — cite line numbers when pointing out issues
- Rate severity as: critical, warning, or suggestion
- Never suggest rewriting the entire codebase — focus on actionable fixes

## Output Format
For each issue found, respond with:
- **File**: filename
- **Line**: line number
- **Severity**: critical | warning | suggestion
- **Issue**: one-sentence description
- **Fix**: concrete code change

## Example
**File**: auth.py
**Line**: 23
**Severity**: critical
**Issue**: SQL query built with string concatenation allows injection
**Fix**: Use parameterized queries: `cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))`
"""

This structure works because it eliminates ambiguity. The model knows what role to play, what rules to follow, what format to use, and has a concrete example to anchor its behavior.

Technique 1: Constrain the Output

The biggest source of LLM inconsistency is unconstrained output. Fix this by specifying the exact format you want.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system="""You are a JSON API that classifies support tickets.

Respond with ONLY valid JSON, no markdown fences, no explanation.

Schema:
{
  "category": "billing" | "technical" | "account" | "other",
  "priority": "low" | "medium" | "high" | "urgent",
  "summary": "one sentence summary",
  "suggested_team": "string"
}""",
    messages=[
        {"role": "user", "content": "I've been charged twice for my subscription this month and I need a refund ASAP"}
    ]
)

print(message.content[0].text)
# {"category": "billing", "priority": "high", "summary": "Customer charged twice for subscription, requesting refund", "suggested_team": "billing-support"}

By defining the exact JSON schema and saying “respond with ONLY valid JSON,” you get parseable output every single time.

Technique 2: Few-Shot Examples

When the task is nuanced, one example isn’t enough. Give the model 2-3 examples that cover the edge cases.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
system_prompt = """You convert natural language time expressions to ISO 8601 format.

Examples:
- "next Tuesday at 3pm" → "2026-02-17T15:00:00"
- "in 2 hours" → relative, calculate from current time
- "last Christmas" → "2025-12-25T00:00:00"
- "EOD Friday" → "2026-02-20T17:00:00"

If the expression is ambiguous, return the most likely interpretation with a confidence score.
Format: {"datetime": "ISO-8601", "confidence": 0.0-1.0, "ambiguous": true|false}
"""

Few-shot examples are particularly effective for classification, formatting, and style-matching tasks. They ground the model’s behavior in concrete patterns rather than abstract instructions.

Technique 3: Chain of Thought in System Prompts

For complex reasoning tasks, tell the model to think step by step — but structure that thinking.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
system_prompt = """You are a database query optimizer.

When analyzing a query:
1. First, identify the tables and joins involved
2. Check for missing indexes on WHERE and JOIN columns
3. Look for N+1 query patterns
4. Estimate row counts at each stage
5. Suggest the optimized query with EXPLAIN analysis

Think through each step before giving your final recommendation.
Show your reasoning, then provide the optimized query.
"""

This works because you’re not just saying “think step by step” — you’re defining the exact steps. The model follows your reasoning framework instead of inventing its own.

Common Mistakes

Too long. If your system prompt is 2,000 words, the model will lose focus on the important parts. Keep it under 500 words and use hierarchy (headers, bullets) to structure it.

Too vague. “Be helpful and accurate” tells the model nothing. “Respond in 3 bullet points, each under 20 words, focusing on actionable next steps” tells it everything.

No negative examples. Tell the model what NOT to do. “Never apologize for being an AI” or “Don’t use marketing language like ‘revolutionize’ or ‘game-changing’” prevents the most common failure modes.

Conflicting instructions. “Be concise” and “provide thorough explanations” in the same prompt will confuse the model. Pick one and be specific about what it means (e.g., “limit responses to 150 words maximum”).

Testing Your Prompts

Run the same prompt with 10 different inputs and check for consistency. If the output format varies, your constraints aren’t tight enough.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Quick test loop
for i in $(seq 1 10); do
  curl -s https://api.anthropic.com/v1/messages \
    -H "x-api-key: $ANTHROPIC_API_KEY" \
    -H "content-type: application/json" \
    -H "anthropic-version: 2023-06-01" \
    -d '{
      "model": "claude-sonnet-4-5-20250514",
      "max_tokens": 256,
      "system": "Classify this text as positive, negative, or neutral. Respond with ONE word only.",
      "messages": [{"role": "user", "content": "The product works but shipping was slow"}]
    }' | jq -r '.content[0].text'
done

If you get “neutral” 10 out of 10 times, your prompt is solid. If it varies between “neutral” and “mixed” or adds explanations, tighten the constraints.

Quick Reference

Pattern	When to Use	Example
Role + Rules	Every prompt	“You are a X that does Y”
Output Schema	API responses	“Respond with JSON: {schema}”
Few-Shot	Nuanced tasks	2-3 input/output pairs
Chain of Thought	Reasoning tasks	“Think through steps 1-5”
Negative Examples	Preventing failures	“Never do X, Y, Z”

The Problem with Vague Prompts#

The Core Framework#

Technique 1: Constrain the Output#

Technique 2: Few-Shot Examples#

Technique 3: Chain of Thought in System Prompts#

Common Mistakes#

Testing Your Prompts#

Quick Reference#

Related Guides#

About the Author