Most LLMs perform best when prompted in English. But your users speak dozens of languages. The fix: translate incoming prompts to English, run them through the model, then translate the response back. You get the best of both worlds – English-optimized reasoning with native-language output.

Here’s how to build that pipeline with OpenAI and deep-translator.

Install Dependencies

1
pip install openai deep-translator langdetect

deep-translator wraps Google Translate (free tier, no API key needed for moderate usage). langdetect identifies the source language automatically so you don’t have to ask the user.

Detect Language and Translate the Prompt

The core flow is three steps: detect the user’s language, translate the prompt to English, then call the LLM.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
from langdetect import detect
from deep_translator import GoogleTranslator
from openai import OpenAI

client = OpenAI()

def detect_language(text: str) -> str:
    """Detect the language code of the input text."""
    lang = detect(text)
    return lang

def translate_text(text: str, source: str, target: str) -> str:
    """Translate text between any two languages."""
    if source == target:
        return text
    translated = GoogleTranslator(source=source, target=target).translate(text)
    return translated

def multilingual_prompt(user_input: str, system_prompt: str = "You are a helpful assistant.") -> dict:
    """Send a prompt in any language, get a response in the same language."""
    # Step 1: Detect the user's language
    user_lang = detect_language(user_input)
    print(f"Detected language: {user_lang}")

    # Step 2: Translate to English if needed
    english_prompt = translate_text(user_input, source=user_lang, target="en")
    print(f"English prompt: {english_prompt}")

    # Step 3: Call the LLM with the English prompt
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": english_prompt},
        ],
        temperature=0.7,
    )
    english_reply = response.choices[0].message.content

    # Step 4: Translate the response back to the user's language
    translated_reply = translate_text(english_reply, source="en", target=user_lang)

    return {
        "detected_language": user_lang,
        "original_input": user_input,
        "english_prompt": english_prompt,
        "english_reply": english_reply,
        "translated_reply": translated_reply,
    }

# Try it with a Spanish input
result = multilingual_prompt("Explica cómo funciona la memoria en los transformers")
print(result["translated_reply"])

That’s the whole pipeline. The user writes in Spanish, the model thinks in English, and the answer comes back in Spanish.

Prompt Templates for Multiple Languages

Sometimes you want structured prompts – not just raw user input. Build language-aware templates that inject translated content into a fixed English scaffold.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
from langdetect import detect
from deep_translator import GoogleTranslator
from openai import OpenAI

client = OpenAI()

PROMPT_TEMPLATES = {
    "explain": "Explain the following concept clearly and concisely: {topic}",
    "compare": "Compare and contrast the following two items: {item_a} vs {item_b}",
    "debug": "The following code has a bug. Identify the issue and provide a fix:\n\n{code}",
}

def build_templated_prompt(
    template_name: str,
    user_inputs: dict,
    source_lang: str = "auto",
) -> str:
    """Build an English prompt from a template, translating user inputs as needed."""
    template = PROMPT_TEMPLATES[template_name]

    translated_inputs = {}
    for key, value in user_inputs.items():
        if source_lang == "auto":
            detected = detect(value)
        else:
            detected = source_lang

        translated_inputs[key] = GoogleTranslator(
            source=detected, target="en"
        ).translate(value)

    return template.format(**translated_inputs)

def ask_with_template(
    template_name: str,
    user_inputs: dict,
    response_lang: str = "en",
) -> str:
    """Run a templated prompt and optionally translate the response."""
    english_prompt = build_templated_prompt(template_name, user_inputs)

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a technical expert. Be precise and direct."},
            {"role": "user", "content": english_prompt},
        ],
        temperature=0.5,
    )
    english_reply = response.choices[0].message.content

    if response_lang != "en":
        return GoogleTranslator(source="en", target=response_lang).translate(english_reply)
    return english_reply

# User asks in Japanese, gets answer in Japanese
answer = ask_with_template(
    template_name="explain",
    user_inputs={"topic": "ニューラルネットワークの逆伝播"},
    response_lang="ja",
)
print(answer)

The template stays in English. Only the user-provided values get translated. This keeps your prompt structure consistent regardless of input language.

Handling Language-Specific Edge Cases

A few gotchas you’ll hit in production:

Chinese variants. langdetect returns zh-cn for simplified Chinese and zh-tw for traditional. Google Translate expects zh-CN and zh-TW (note the capitalization). Normalize before passing to the translator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def normalize_lang_code(lang: str) -> str:
    """Fix language code mismatches between langdetect and Google Translate."""
    mapping = {
        "zh-cn": "zh-CN",
        "zh-tw": "zh-TW",
        "zh": "zh-CN",  # default to simplified
        "iw": "he",     # Hebrew legacy code
        "jw": "jv",     # Javanese legacy code
    }
    return mapping.get(lang, lang)

Short text detection. langdetect struggles with inputs under 20 characters. A single word like “Bonjour” might get misclassified. For short inputs, let the user specify their language or fall back to a default.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from langdetect import detect, LangDetectException

def safe_detect(text: str, fallback: str = "en") -> str:
    """Detect language with a fallback for short or ambiguous text."""
    if len(text.strip()) < 20:
        return fallback
    try:
        return normalize_lang_code(detect(text))
    except LangDetectException:
        return fallback

Translation length blowup. Some languages (like German or Finnish) produce much longer text than English. If you’re running translated prompts through a model with tight token limits, monitor the translated length and truncate or summarize if needed.

Full Pipeline with Error Handling

Here’s the production-ready version that handles all the edge cases above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
from langdetect import detect, LangDetectException
from deep_translator import GoogleTranslator
from openai import OpenAI

client = OpenAI()

def normalize_lang_code(lang: str) -> str:
    mapping = {
        "zh-cn": "zh-CN",
        "zh-tw": "zh-TW",
        "zh": "zh-CN",
        "iw": "he",
        "jw": "jv",
    }
    return mapping.get(lang, lang)

def safe_detect(text: str, fallback: str = "en") -> str:
    if len(text.strip()) < 20:
        return fallback
    try:
        return normalize_lang_code(detect(text))
    except LangDetectException:
        return fallback

def translate_safe(text: str, source: str, target: str) -> str:
    if source == target:
        return text
    try:
        return GoogleTranslator(source=source, target=target).translate(text)
    except Exception as e:
        print(f"Translation failed ({source} -> {target}): {e}")
        return text  # return original on failure

def multilingual_chat(
    user_input: str,
    system_prompt: str = "You are a helpful assistant.",
    model: str = "gpt-4o",
    force_lang: str | None = None,
) -> dict:
    """Full multilingual pipeline with error handling."""
    # Detect or use forced language
    user_lang = force_lang if force_lang else safe_detect(user_input)

    # Translate to English
    english_prompt = translate_safe(user_input, source=user_lang, target="en")

    # Call the LLM
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": english_prompt},
        ],
        temperature=0.7,
    )
    english_reply = response.choices[0].message.content

    # Translate back
    final_reply = translate_safe(english_reply, source="en", target=user_lang)

    return {
        "language": user_lang,
        "reply": final_reply,
        "english_reply": english_reply,
    }

# German input
result = multilingual_chat("Wie kann ich ein neuronales Netz in PyTorch trainieren?")
print(result["reply"])

# Force language for short input
result = multilingual_chat("Bonjour", force_lang="fr")
print(result["reply"])

When to Skip Translation Entirely

Not every request needs translation. GPT-4o and similar models handle many languages natively. If your use case is casual conversation, you might get acceptable results by prompting the model directly in the user’s language and adding a system instruction like “Always respond in the same language as the user.”

Translation pipelines shine when:

  • You need consistent, high-quality English reasoning (math, code, logic tasks)
  • Your prompts are complex templates that were optimized in English
  • You’re using a smaller or fine-tuned model that only works well in English
  • You need deterministic language detection for routing or logging

Common Errors and Fixes

langdetect.lang_detect_exception.LangDetectException: No features in text – Happens with empty strings or whitespace-only input. Always validate input before calling detect().

deep_translator.exceptions.NotValidLength – Google Translate has a 5000 character limit per request. Split long text into chunks and translate each one separately, then join the results.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def translate_long_text(text: str, source: str, target: str, chunk_size: int = 4500) -> str:
    """Translate text that might exceed the 5000 char limit."""
    if len(text) <= chunk_size:
        return GoogleTranslator(source=source, target=target).translate(text)

    chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
    translated_chunks = [
        GoogleTranslator(source=source, target=target).translate(chunk)
        for chunk in chunks
    ]
    return " ".join(translated_chunks)

openai.RateLimitError – If you’re translating and then calling the API in a loop, you’ll hit rate limits fast. Add exponential backoff or batch your requests.

Wrong language detected for mixed-language input – If the user writes “I want to learn sobre redes neuronales”, langdetect might pick either English or Spanish. For mixed input, force a language or ask the user to specify.

Google Translate returns None – Happens occasionally with very short inputs or unsupported character sequences. Always check the return value before passing it to the LLM.