How to Build a Text Style Transfer Pipeline with Transformers

Text style transfer rewrites a sentence so it sounds different while saying the same thing. You might convert formal legal prose into plain English, turn a dry technical paragraph into something a blog reader actually finishes, or flip passive voice to active. The meaning stays; the style changes.

This matters whenever you’re adapting content for a new audience, building accessibility tools, or localizing tone for different markets. The good news: you don’t need a custom-trained model. A well-prompted LLM or a paraphrase model gets you 90% of the way there.

Prompt-Based Style Transfer with OpenAI

The fastest path to high-quality style transfer is a prompted LLM. GPT-4o handles nuance well – it can shift formality, tone, and domain vocabulary without losing the original meaning.

Here’s a reusable function that takes source text, the current style, and the target style:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from openai import OpenAI

client = OpenAI()  # uses OPENAI_API_KEY env var


def transfer_style(text: str, source_style: str, target_style: str) -> str:
    """Rewrite text from one style to another while preserving meaning."""
    system_prompt = (
        f"You are a writing style transfer tool. "
        f"Rewrite the user's text from {source_style} style to {target_style} style. "
        f"Preserve the original meaning exactly. Do not add new information, "
        f"opinions, or commentary. Return only the rewritten text."
    )

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": text},
        ],
        temperature=0.4,  # low temp keeps rewrites faithful
        max_tokens=1024,
    )

    return response.choices[0].message.content.strip()


# Formal -> Casual
formal_text = (
    "We regret to inform you that your application has been declined "
    "due to insufficient documentation provided during the review process."
)
casual = transfer_style(formal_text, "formal business", "casual conversational")
print(f"Casual: {casual}")

# Technical -> Simple
technical_text = (
    "The transformer architecture employs multi-head self-attention mechanisms "
    "to compute contextual representations of input token embeddings in parallel."
)
simple = transfer_style(technical_text, "technical academic", "simple plain English")
print(f"Simple: {simple}")

A few notes on the system prompt design. Telling the model “return only the rewritten text” prevents it from adding preambles like “Sure! Here’s the rewritten version:”. Setting temperature=0.4 strikes a balance – enough creativity for natural phrasing, not so much that it drifts from the original meaning.

You can push this further with style pairs like “passive to active”, “verbose to concise”, or “marketing copy to neutral factual”. The prompt structure stays the same; just swap the style labels.

Local Style Transfer with T5

If you don’t want to send text to an external API – maybe you’re processing sensitive documents or need to run offline – a local paraphrase model works well. Paraphrasing is the core mechanism behind style transfer: generate alternative phrasings, then pick the one closest to your target style.

The humarin/chatgpt_paraphraser_on_T5_base model is a solid choice. It was fine-tuned on paraphrase data and produces diverse rewrites you can steer with generation parameters:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

model_name = "humarin/chatgpt_paraphraser_on_T5_base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)


def paraphrase(
    text: str,
    num_return_sequences: int = 5,
    num_beams: int = 10,
    temperature: float = 1.5,
    max_length: int = 256,
) -> list[str]:
    """Generate diverse paraphrases of the input text."""
    input_text = f"paraphrase: {text}"
    inputs = tokenizer(
        input_text,
        return_tensors="pt",
        max_length=max_length,
        truncation=True,
    ).to(device)

    outputs = model.generate(
        **inputs,
        max_length=max_length,
        num_beams=num_beams,
        num_return_sequences=num_return_sequences,
        temperature=temperature,
        do_sample=True,
        top_k=50,
        top_p=0.95,
    )

    results = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    # Deduplicate while preserving order
    seen = set()
    unique = []
    for r in results:
        if r.lower() not in seen:
            seen.add(r.lower())
            unique.append(r)
    return unique


source = "The committee has decided to postpone the meeting until further notice."
paraphrases = paraphrase(source, num_return_sequences=5)
for i, p in enumerate(paraphrases, 1):
    print(f"{i}. {p}")

Higher temperature values (1.2-1.8) produce more stylistically diverse outputs. Lower values stick closer to the original phrasing. Cranking up num_beams gives the model more paths to explore, which generally improves quality at the cost of inference time.

The trick for style transfer specifically: generate several candidates, then pick the one that best matches your target style. You can do this manually for small jobs, or automate it with a classifier (covered next).

Batch Processing and Evaluation

For production pipelines, you need to process lists of texts and measure whether the style actually changed while the content was preserved. Here’s a batch pipeline that uses a formality classifier to score outputs and BLEU to check content similarity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
from openai import OpenAI
from transformers import pipeline
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
import nltk

nltk.download("punkt", quiet=True)

client = OpenAI()

# Load a formality classifier to measure style shift
formality_classifier = pipeline(
    "text-classification",
    model="s-nlp/roberta-base-formality-ranker",
)


def transfer_style_batch(
    texts: list[str], source_style: str, target_style: str
) -> list[str]:
    """Rewrite a batch of texts from one style to another."""
    results = []
    system_prompt = (
        f"Rewrite the text from {source_style} to {target_style}. "
        f"Preserve meaning exactly. Return only the rewritten text."
    )
    for text in texts:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": text},
            ],
            temperature=0.4,
            max_tokens=512,
        )
        results.append(response.choices[0].message.content.strip())
    return results


def evaluate_transfer(
    originals: list[str], rewritten: list[str]
) -> list[dict]:
    """Score each rewrite for formality and content preservation."""
    smoother = SmoothingFunction().method1
    evaluations = []

    for orig, rewrite in zip(originals, rewritten):
        # Formality score (higher = more formal)
        orig_score = formality_classifier(orig[:512])[0]
        rewrite_score = formality_classifier(rewrite[:512])[0]

        # Content preservation via BLEU
        ref_tokens = nltk.word_tokenize(orig.lower())
        hyp_tokens = nltk.word_tokenize(rewrite.lower())
        bleu = sentence_bleu(
            [ref_tokens], hyp_tokens, smoothing_function=smoother
        )

        evaluations.append({
            "original": orig[:80] + "...",
            "rewritten": rewrite[:80] + "...",
            "orig_formality": f"{orig_score['label']} ({orig_score['score']:.2f})",
            "new_formality": f"{rewrite_score['label']} ({rewrite_score['score']:.2f})",
            "bleu_similarity": round(bleu, 3),
        })

    return evaluations


# Example: formal -> casual batch
formal_texts = [
    "We kindly request that you submit the requisite documentation at your earliest convenience.",
    "The organization has implemented a revised policy effective immediately.",
    "It is imperative that all stakeholders review the amended contractual obligations.",
]

casual_texts = transfer_style_batch(formal_texts, "formal", "casual")

scores = evaluate_transfer(formal_texts, casual_texts)
for s in scores:
    print(f"BLEU: {s['bleu_similarity']} | "
          f"Formality: {s['orig_formality']} -> {s['new_formality']}")
    print(f"  Original:  {s['original']}")
    print(f"  Rewritten: {s['rewritten']}")
    print()

A BLEU score above 0.2 typically means the content was reasonably preserved (it won’t be high since the wording changes intentionally). The formality classifier should show a clear shift in label or confidence. If BLEU drops below 0.1, the model is probably hallucinating new content rather than rewriting.

Common Errors and Fixes

Model adds commentary instead of just rewriting. This happens when your system prompt is too vague. Always include “Return only the rewritten text” and set temperature below 0.6. If the model still adds preambles, strip everything before the first sentence that resembles your input.

Token limits on long documents. GPT-4o handles around 128K tokens, but style transfer works best on paragraph-sized chunks. For longer documents, split by paragraph, transfer each one, then reassemble:

1
2
3
4
5
6
7
def transfer_long_text(text: str, source_style: str, target_style: str) -> str:
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
    transferred = []
    for para in paragraphs:
        result = transfer_style(para, source_style, target_style)
        transferred.append(result)
    return "\n\n".join(transferred)

T5 max_length truncation. If your T5 outputs are getting cut off mid-sentence, increase max_length in both the tokenizer call and model.generate(). The default is often 128 tokens, which is too short for anything beyond a single sentence. Set it to 256 or 512 depending on your input lengths.

OpenAI refusing style transfers. If you ask the model to rewrite text “in the style of [specific person]”, it may refuse due to impersonation policies. Describe the target style by its characteristics instead: “professional but warm”, “concise and direct”, “academic with passive voice”. This sidesteps the issue entirely and usually produces better results anyway, since style characteristics are more precise than celebrity impressions.

Low diversity from T5 paraphrases. If all your generated paraphrases look nearly identical, increase temperature to 1.5+, enable do_sample=True, and raise num_beams. You can also try top_k=120 for a wider sampling pool.

Prompt-Based Style Transfer with OpenAI#

Local Style Transfer with T5#

Batch Processing and Evaluation#

Common Errors and Fixes#

Related Guides#

About the Author

Prompt-Based Style Transfer with OpenAI

Local Style Transfer with T5

Batch Processing and Evaluation

Common Errors and Fixes

Related Guides