The Quick Version
FLUX.2 is Black Forest Labs’ latest image generation family, released January 2026. The Klein 4B model runs on a single consumer GPU, generates images in under a second, and ships under the Apache 2.0 license. Here’s the minimum code to get an image out of it.
| |
| |
That’s it. First run downloads about 8GB of weights from Hugging Face, then it loads from cache. On an RTX 4090, Klein 4B produces a 1024x1024 image in roughly 0.5 seconds.
Pick the Right Model
FLUX.2 ships in several variants, and choosing the wrong one wastes time or money.
| Model | Size | License | VRAM | Speed | Best For |
|---|---|---|---|---|---|
FLUX.2-klein-4B | 4B params | Apache 2.0 | ~8GB | Sub-second | Real-time apps, iteration |
FLUX.2-klein-9B | 9B params | Non-commercial | ~16GB | Sub-second | Higher quality, personal use |
FLUX.2-dev | 32B params | Non-commercial | 80GB+ (or ~20GB quantized) | ~10s | Maximum quality, research |
FLUX.2-pro / FLUX.2-max | API only | Commercial | N/A | API latency | Production, commercial use |
Klein models are distilled – they use fixed inference steps (4) and guidance scale (1.0). You cannot change these parameters. If you need to tune those values, use the base variants: FLUX.2-klein-base-4B or FLUX.2-klein-base-9B, which support adjustable steps (typically 50) and guidance scale (typically 4.0).
Use the Base Model for More Control
The distilled Klein models are fast but rigid. The base variants trade speed for full control over the generation process.
| |
Higher num_inference_steps means more denoising passes, which generally improves fine detail. guidance_scale controls prompt adherence – higher values follow the text more literally but can look overcooked past 6 or 7.
Run the Dev 32B Model on a Consumer GPU
The FLUX.2-dev model produces the best results, but at 32B parameters it needs over 80GB VRAM in full precision. You can run it on a 24GB card using 4-bit quantization and CPU offloading.
| |
enable_model_cpu_offload() moves each pipeline component to the GPU only when it’s needed, then back to CPU. This cuts VRAM usage to roughly 20GB at the cost of slower generation – expect around 30-60 seconds per image instead of 10.
If you have even less VRAM (8GB), use group offloading with the remote text encoder. This moves text encoding to Hugging Face’s servers so your GPU only handles the transformer:
| |
This approach needs just 8GB of VRAM but requires 32GB of system RAM.
Edit Images with References
One of FLUX.2’s standout features is unified image editing. The same pipeline handles text-to-image, single-reference editing, and multi-reference composition – pass images in and the model incorporates them.
| |
FLUX.2 supports up to 10 reference images simultaneously. Reference them in the prompt as “image 1”, “image 2”, etc., in the order you pass them to the image parameter.
Reproducible Seeds and Batch Generation
For workflows where consistency matters, lock down the random seed and generate variations systematically.
| |
Same prompt, same seed, same model weights – you get the exact same image every time. This is essential for A/B testing prompts or building repeatable pipelines.
Structured JSON Prompts
FLUX.2 handles structured JSON prompts natively. This gives you precise control over scene composition, lighting, camera settings, and color palettes without cramming everything into a single sentence.
| |
This is especially useful for product photography workflows where you need consistent framing across a catalog.
Common Errors and Fixes
torch.cuda.OutOfMemoryError: CUDA out of memory
This is the most common issue. The fix depends on which model you’re running:
- Klein 4B (needs ~8GB): Switch to
torch.bfloat16, or try the FP8 variant atblack-forest-labs/FLUX.2-klein-4b-fp8 - Klein 9B (needs ~16GB): Add
pipe.enable_model_cpu_offload()before moving to CUDA - Dev 32B: Use the quantized
diffusers/FLUX.2-dev-bnb-4bitmodel with CPU offloading
ImportError: cannot import name 'Flux2Pipeline' from 'diffusers'
Your diffusers version is too old. FLUX.2 support landed after the initial 0.32 release. Install from main:
| |
OSError: You are trying to access a gated repo
The Klein 9B and Dev models are gated on Hugging Face. You need to accept the license on the model page, then authenticate:
| |
ValueError: Could not instantiate the tokenizer
Missing the sentencepiece library. FLUX.2 uses Mistral Small 3.1 as its text encoder, which needs sentencepiece:
| |
Distilled Klein model ignoring your num_inference_steps and guidance_scale values.
This is expected behavior, not a bug. The distilled Klein variants (FLUX.2-klein-4B, FLUX.2-klein-9B) are locked to 4 steps and guidance 1.0. If you need adjustable parameters, switch to the base variants: FLUX.2-klein-base-4B or FLUX.2-klein-base-9B.
What Hardware Do You Actually Need
Here’s a realistic breakdown based on the model you want to run:
- FLUX.2-klein-4B: RTX 3060 12GB or better. Runs at sub-second speed on an RTX 4090. This is the sweet spot for most people.
- FLUX.2-klein-9B: RTX 4080 16GB or better. Or use CPU offloading on a 12GB card.
- FLUX.2-dev (quantized): RTX 4090 24GB with 4-bit quantization and CPU offloading. Expect 30-60 second generation times.
- FLUX.2-dev (full precision): H100 80GB or equivalent. Fastest at about 10 seconds per image.
If you don’t have the hardware, the BFL API starts at $0.014 per image for Klein and scales by resolution. Third-party providers like fal.ai and Replicate also host FLUX.2 models with pay-per-image pricing.