How to Detect AI-Generated Text with Watermarking

The Core Idea: Bias Token Selection, Then Measure the Bias

Text watermarking works by subtly nudging which tokens the model picks during generation. During detection, you measure whether the output shows that statistical nudge. No retraining, no separate classifier, no model access needed at detection time.

The two dominant approaches right now are KGW (Kirchenbauer et al.) and SynthID Text (Google DeepMind). KGW splits the vocabulary into “green” and “red” lists using a hash of the previous token, then boosts green-list logits. SynthID uses a learned pseudo-random g-function across multiple layers for harder-to-remove watermarks.

Both methods are open source. Here’s how to use each one.

KGW Watermarking: Green List Detection

The KGW algorithm from the University of Maryland is the foundational approach most research builds on. Install the reference implementation:

1
2
3
git clone https://github.com/jwkirchenbauer/lm-watermarking.git
cd lm-watermarking
pip install transformers torch

Watermark text during generation by injecting a WatermarkLogitsProcessor into the model’s generation pipeline:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from extended_watermark_processor import WatermarkLogitsProcessor, WatermarkDetector
from transformers import AutoTokenizer, AutoModelForCausalLM, LogitsProcessorList

model_name = "facebook/opt-1.3b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# gamma=0.25 means 25% of vocab is "green" at each step
# delta=2.0 is the logit boost for green-list tokens
watermark_processor = WatermarkLogitsProcessor(
    vocab=list(tokenizer.get_vocab().values()),
    gamma=0.25,
    delta=2.0,
    seeding_scheme="selfhash",
)

prompt = "Explain how neural networks learn from data"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

output_tokens = model.generate(
    **inputs,
    max_new_tokens=200,
    do_sample=True,
    logits_processor=LogitsProcessorList([watermark_processor]),
)

# Strip the prompt tokens before decoding
output_tokens = output_tokens[:, inputs["input_ids"].shape[-1]:]
watermarked_text = tokenizer.batch_decode(output_tokens, skip_special_tokens=True)[0]
print(watermarked_text)

The gamma parameter controls what fraction of the vocabulary is in the green list. Lower gamma means a stronger watermark but more impact on text quality. A delta of 2.0 adds that value to the logits of green-list tokens before sampling, so the model strongly prefers them without completely overriding its natural distribution.

Detect the Watermark

Detection counts how many green-list tokens appear in a text sample and runs a one-proportion z-test. If the z-score exceeds a threshold, the text is watermarked:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
watermark_detector = WatermarkDetector(
    vocab=list(tokenizer.get_vocab().values()),
    gamma=0.25,
    seeding_scheme="selfhash",
    device=model.device,
    tokenizer=tokenizer,
    z_threshold=4.0,
    normalizers=[],
    ignore_repeated_ngrams=True,
)

result = watermark_detector.detect(watermarked_text)
print(f"z-score: {result['z_score']:.2f}")
print(f"p-value: {result['p_value']:.6f}")
print(f"Prediction: {result['prediction']}")
# Example output:
# z-score: 6.42
# p-value: 0.000000
# Prediction: True

A z-score above 4.0 gives you a false positive rate below 3 x 10^-5. For human-written text, the z-score typically stays below 2.0 because token selection doesn’t follow the green-list bias.

If detection returns a z-score near the threshold (say 3.5-4.5), that’s the uncertain zone. Either gather more text from the same source, or lower your z_threshold while accepting a higher false positive rate.

SynthID Text: Production-Grade Watermarking

SynthID Text is what Google runs in production across Gemini. It’s integrated directly into Hugging Face Transformers since v4.46.0. Unlike KGW, SynthID uses a tournament sampling mechanism and multiple watermarking keys across layers, making the watermark harder to remove through paraphrasing.

1
pip install transformers>=4.46.0 synthid-text torch

Apply a watermark through the watermarking_config parameter on model.generate():

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, SynthIDTextWatermarkingConfig

model_name = "google/gemma-2b-it"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name, device_map="auto", torch_dtype=torch.bfloat16
)

# SynthID watermark configuration
watermark_config = SynthIDTextWatermarkingConfig(
    keys=[654, 400, 836, 123, 340, 443, 597, 487],
    ngram_len=5,
)

inputs = tokenizer(
    "What are the benefits of open-source AI models?",
    return_tensors="pt",
).to(model.device)

output = model.generate(
    **inputs,
    watermarking_config=watermark_config,
    do_sample=True,
    max_new_tokens=256,
    temperature=0.7,
    top_k=40,
    top_p=0.95,
)

watermarked_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(watermarked_text)

The keys list should contain unique integers, one per watermarking layer. More keys spread the watermark signal across more dimensions, improving robustness. The ngram_len parameter controls how many preceding tokens form the context window for the pseudo-random function.

Detect SynthID Watermarks

SynthID offers three detection strategies: mean scoring, weighted mean, and a Bayesian detector. The Bayesian detector is the most accurate but needs a small training step:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from synthid_text import logits_processing
import numpy as np

# Initialize the logits processor with same config used during generation
logits_processor = logits_processing.SynthIDLogitsProcessor(
    keys=[654, 400, 836, 123, 340, 443, 597, 487],
    ngram_len=5,
    sampling_table_size=2**16,
    sampling_table_seed=0,
    context_history_size=1024,
    device=torch.device("cpu"),
)

# Tokenize the text you want to check
test_tokens = tokenizer(watermarked_text, return_tensors="pt")
input_ids = test_tokens["input_ids"]

# Compute g-values (the watermark signal)
g_values = logits_processor.compute_g_values(input_ids=input_ids)

# Build masks for valid detection regions
eos_token_mask = logits_processor.compute_eos_token_mask(
    input_ids=input_ids,
    eos_token_id=tokenizer.eos_token_id,
)[:, watermark_config.ngram_len - 1:]

context_repetition_mask = logits_processor.compute_context_repetition_mask(
    input_ids=input_ids,
)
combined_mask = context_repetition_mask * eos_token_mask

# Mean detector: no training needed, quick check
g_values_np = g_values.cpu().numpy()
mask_np = combined_mask.cpu().numpy()
masked_g = g_values_np * mask_np
mean_score = np.sum(masked_g) / np.sum(mask_np)
print(f"Mean g-value score: {mean_score:.4f}")
# Scores > 0.5 suggest watermarked text
# Scores around 0.0 suggest unwatermarked text

The mean detector works well for quick checks. For production use where you need to set precise false positive rates, train the Bayesian detector on a mix of watermarked and unwatermarked samples. Google provides a training notebook in the synthid-text repository.

Common Pitfalls and How to Handle Them

Short text kills detection accuracy. Both KGW and SynthID need roughly 50-100 tokens minimum for reliable detection. On text under 25 tokens, false negative rates shoot up to 40-60%. If you’re watermarking an API, enforce a minimum response length or aggregate multiple short responses before running detection.

Paraphrasing degrades watermarks. Someone running your watermarked text through a second LLM for rewriting will reduce or eliminate the signal. KGW is especially vulnerable here. SynthID’s multi-layer approach survives mild paraphrasing, but heavy rewriting still breaks it. There’s no complete solution to this yet.

Factual text is harder to watermark. When the model’s probability distribution is highly peaked on specific tokens (like dates, names, or code syntax), there’s less room to bias selection without hurting accuracy. Both methods acknowledge this as a fundamental limitation.

Mismatched config between embed and detect fails silently. If your detection uses different gamma, delta, keys, or ngram_len values than what was used during generation, detection will return low scores without any error. Always store and version your watermark configuration.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Save watermark config alongside your model deployment
import json

config = {
    "method": "kgw",
    "gamma": 0.25,
    "delta": 2.0,
    "seeding_scheme": "selfhash",
    "z_threshold": 4.0,
    "model_name": "facebook/opt-1.3b",
    "created": "2026-02-14",
}

with open("watermark_config.json", "w") as f:
    json.dump(config, f, indent=2)

MarkLLM: A Unified Toolkit

If you want to experiment with multiple watermarking algorithms under one API, MarkLLM from Tsinghua University wraps KGW, SynthID, and several other methods into a single Python package:

1
pip install markllm

It provides consistent interfaces for embedding and detection across algorithms, which makes benchmarking straightforward. The tradeoff is that it lags behind the upstream implementations on new features.

When to Use Which Method

Pick KGW when you need a simple, well-understood baseline with easy z-test detection and don’t need to survive paraphrasing attacks. Use SynthID when you’re deploying a production LLM API and need robustness against casual tampering. Both are open source, both work with standard Hugging Face models, and both can be added to any text generation pipeline without retraining.

The real constraint isn’t the watermarking algorithm. It’s governance: you need to keep your watermark keys secret, version your configs, and build monitoring around detection endpoints. The cryptographic strength of the watermark doesn’t matter if someone leaks your keys.

The Core Idea: Bias Token Selection, Then Measure the Bias#

KGW Watermarking: Green List Detection#

Detect the Watermark#

SynthID Text: Production-Grade Watermarking#

Detect SynthID Watermarks#

Common Pitfalls and How to Handle Them#

MarkLLM: A Unified Toolkit#

When to Use Which Method#

Related Guides#

About the Author