The Core Idea: Bias Token Selection, Then Measure the Bias
Text watermarking works by subtly nudging which tokens the model picks during generation. During detection, you measure whether the output shows that statistical nudge. No retraining, no separate classifier, no model access needed at detection time.
The two dominant approaches right now are KGW (Kirchenbauer et al.) and SynthID Text (Google DeepMind). KGW splits the vocabulary into “green” and “red” lists using a hash of the previous token, then boosts green-list logits. SynthID uses a learned pseudo-random g-function across multiple layers for harder-to-remove watermarks.
Both methods are open source. Here’s how to use each one.
KGW Watermarking: Green List Detection
The KGW algorithm from the University of Maryland is the foundational approach most research builds on. Install the reference implementation:
| |
Watermark text during generation by injecting a WatermarkLogitsProcessor into the model’s generation pipeline:
| |
The gamma parameter controls what fraction of the vocabulary is in the green list. Lower gamma means a stronger watermark but more impact on text quality. A delta of 2.0 adds that value to the logits of green-list tokens before sampling, so the model strongly prefers them without completely overriding its natural distribution.
Detect the Watermark
Detection counts how many green-list tokens appear in a text sample and runs a one-proportion z-test. If the z-score exceeds a threshold, the text is watermarked:
| |
A z-score above 4.0 gives you a false positive rate below 3 x 10^-5. For human-written text, the z-score typically stays below 2.0 because token selection doesn’t follow the green-list bias.
If detection returns a z-score near the threshold (say 3.5-4.5), that’s the uncertain zone. Either gather more text from the same source, or lower your z_threshold while accepting a higher false positive rate.
SynthID Text: Production-Grade Watermarking
SynthID Text is what Google runs in production across Gemini. It’s integrated directly into Hugging Face Transformers since v4.46.0. Unlike KGW, SynthID uses a tournament sampling mechanism and multiple watermarking keys across layers, making the watermark harder to remove through paraphrasing.
| |
Apply a watermark through the watermarking_config parameter on model.generate():
| |
The keys list should contain unique integers, one per watermarking layer. More keys spread the watermark signal across more dimensions, improving robustness. The ngram_len parameter controls how many preceding tokens form the context window for the pseudo-random function.
Detect SynthID Watermarks
SynthID offers three detection strategies: mean scoring, weighted mean, and a Bayesian detector. The Bayesian detector is the most accurate but needs a small training step:
| |
The mean detector works well for quick checks. For production use where you need to set precise false positive rates, train the Bayesian detector on a mix of watermarked and unwatermarked samples. Google provides a training notebook in the synthid-text repository.
Common Pitfalls and How to Handle Them
Short text kills detection accuracy. Both KGW and SynthID need roughly 50-100 tokens minimum for reliable detection. On text under 25 tokens, false negative rates shoot up to 40-60%. If you’re watermarking an API, enforce a minimum response length or aggregate multiple short responses before running detection.
Paraphrasing degrades watermarks. Someone running your watermarked text through a second LLM for rewriting will reduce or eliminate the signal. KGW is especially vulnerable here. SynthID’s multi-layer approach survives mild paraphrasing, but heavy rewriting still breaks it. There’s no complete solution to this yet.
Factual text is harder to watermark. When the model’s probability distribution is highly peaked on specific tokens (like dates, names, or code syntax), there’s less room to bias selection without hurting accuracy. Both methods acknowledge this as a fundamental limitation.
Mismatched config between embed and detect fails silently. If your detection uses different gamma, delta, keys, or ngram_len values than what was used during generation, detection will return low scores without any error. Always store and version your watermark configuration.
| |
MarkLLM: A Unified Toolkit
If you want to experiment with multiple watermarking algorithms under one API, MarkLLM from Tsinghua University wraps KGW, SynthID, and several other methods into a single Python package:
| |
It provides consistent interfaces for embedding and detection across algorithms, which makes benchmarking straightforward. The tradeoff is that it lags behind the upstream implementations on new features.
When to Use Which Method
Pick KGW when you need a simple, well-understood baseline with easy z-test detection and don’t need to survive paraphrasing attacks. Use SynthID when you’re deploying a production LLM API and need robustness against casual tampering. Both are open source, both work with standard Hugging Face models, and both can be added to any text generation pipeline without retraining.
The real constraint isn’t the watermarking algorithm. It’s governance: you need to keep your watermark keys secret, version your configs, and build monitoring around detection endpoints. The cryptographic strength of the watermark doesn’t matter if someone leaks your keys.
Related Guides
- How to Build Automated Fairness Testing for LLM-Generated Content
- How to Build Watermark Detection for AI-Generated Images
- How to Build Model Cards and Document AI Systems Responsibly
- How to Implement AI Audit Logging and Compliance Tracking
- How to Build Automated Age and Content Gating for AI Applications
- How to Implement Content Filtering for LLM Applications
- How to Detect and Redact PII in Text with Presidio and Python
- How to Build Automated Toxicity Detection for User-Generated Content
- How to Build Automated Stereotype Detection for LLM Outputs
- How to Build Copyright Detection for AI Training Data