The Pipeline at a Glance
Coloring book pages need clean, bold outlines with no shading or color fills. Standard text-to-image models won’t give you that out of the box. They’ll generate fully rendered images with lighting, textures, and gradients – the opposite of what a kid (or adult) wants to color in.
The trick is chaining three steps: generate a source image with Stable Diffusion, extract structural lines with ControlNet’s lineart model, then post-process with OpenCV to get crisp black-and-white output. Each step is simple. Together they produce print-ready coloring pages.
You need a GPU with at least 8GB VRAM. Install everything first:
| |
Generate the Source Image
Start by creating a base image with Stable Diffusion. The prompt matters here – you want subjects with clear outlines and distinct shapes. Animals, flowers, castles, and vehicles work great. Avoid prompts that produce busy backgrounds or heavy overlapping textures.
| |
The negative prompt is doing heavy lifting. Telling the model to avoid photorealism, shading, and gradients pushes it toward flat, cartoon-style output – much easier to convert into line art later.
Extract Line Art with ControlNet
Now feed that source image through the lineart ControlNet. The lllyasviel/control_v11p_sd15_lineart model is specifically trained to understand and reproduce line structure. Combined with the lineart preprocessor from controlnet_aux, it extracts clean edges that respect the subject’s form.
| |
The controlnet_conditioning_scale at 1.0 tells the model to follow the lineart structure closely. Drop it to 0.6-0.8 if you want more creative freedom in the output, but for coloring books you typically want tight adherence to the original structure.
Post-Process with OpenCV
The raw ControlNet output often has light gray areas, faint lines, or slight color casts. Coloring book pages need pure black lines on a pure white background. OpenCV handles this cleanup fast.
| |
The threshold value controls how aggressive the cleanup is. Lower values (150-170) keep more detail but might leave gray artifacts. Higher values (200-220) produce cleaner output but can erase thin lines. Start at 180 and adjust per image.
Batch Generate Multiple Coloring Pages
A single coloring page isn’t a coloring book. Here’s how to generate a full set of pages from a list of prompts, running the entire pipeline end-to-end.
| |
This loads the models once and reuses them across all pages. Each page takes roughly 15-30 seconds on a modern GPU. For a 20-page coloring book, expect about 5-10 minutes total.
Tips for Better Coloring Pages
Prompt engineering matters more than model tuning. Add “simple background”, “clear outlines”, and “children’s illustration style” to your source prompts. These push the base model toward flat, well-separated shapes that convert cleanly to line art.
Symmetrical subjects work best. Butterflies, mandalas, flowers, and front-facing animals produce cleaner line art than complex action scenes or perspective shots.
Adjust conditioning scale per subject. Detailed subjects like castles benefit from controlnet_conditioning_scale=1.0. Simpler subjects like a single flower might look better at 0.7-0.8 to avoid over-constraining the generation.
Use Canny as an alternative preprocessor. If the lineart detector produces too many fine details, swap it for Canny edge detection with a high threshold. This gives you thicker, bolder edges:
| |
You would then use the Canny ControlNet model (lllyasviel/sd-controlnet-canny) instead of the lineart one.
Common Errors and Fixes
RuntimeError: Expected all tensors to be on the same device – This happens when the ControlNet and base model end up on different devices. Use pipe.enable_model_cpu_offload() instead of manually calling .to("cuda"). The offload method handles device placement automatically.
ValueError: Cannot handle this data type – OpenCV and PIL use different color channel orders (BGR vs RGB). When passing a PIL image to OpenCV, convert it first: cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR). When going the other direction, use Image.fromarray(cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB)).
OutOfMemoryError: CUDA out of memory – Loading two pipelines (SD + ControlNet) eats VRAM fast. Three fixes in order of preference:
- Use
enable_model_cpu_offload()on both pipelines - Delete the first pipeline before creating the second:
del sd_pipe; torch.cuda.empty_cache() - Add
enable_attention_slicing()to reduce peak memory
Lines are too faint after thresholding – Lower your threshold value. If you’re using 200, try 150 or 160. You can also increase line_thickness in the post-processing step to bulk up thin lines.
ControlNet output ignores the conditioning image – Check that controlnet_conditioning_scale isn’t set too low. At 0.0, the model ignores the conditioning entirely. Set it to 0.8-1.0 for coloring book work. Also verify that the conditioning image is the same resolution as the output (default 512x512 for SD 1.5).
FileNotFoundError when loading lllyasviel/Annotators – This model downloads preprocessor weights from Hugging Face. If you’re behind a firewall or have no internet, download the weights manually and pass the local path: LineartDetector.from_pretrained("/path/to/local/annotators").
Related Guides
- How to Build AI Pixel Art Generation with Stable Diffusion
- How to Build AI Sprite Sheet Generation with Stable Diffusion
- How to Build AI Logo Generation with Stable Diffusion and SDXL
- How to Build Real-Time Image Generation with StreamDiffusion
- How to Build AI Texture Generation for Game Assets with Stable Diffusion
- How to Build AI Sticker and Emoji Generation with Stable Diffusion
- How to Build AI Interior Design Rendering with ControlNet and Stable Diffusion
- How to Build AI Wallpaper Generation with Stable Diffusion and Tiling
- How to Build AI Architectural Rendering with ControlNet and Stable Diffusion
- How to Build AI Sketch-to-Image Generation with ControlNet Scribble