How to Build AI Coloring Book Generation with Line Art Diffusion

The Pipeline at a Glance

Coloring book pages need clean, bold outlines with no shading or color fills. Standard text-to-image models won’t give you that out of the box. They’ll generate fully rendered images with lighting, textures, and gradients – the opposite of what a kid (or adult) wants to color in.

The trick is chaining three steps: generate a source image with Stable Diffusion, extract structural lines with ControlNet’s lineart model, then post-process with OpenCV to get crisp black-and-white output. Each step is simple. Together they produce print-ready coloring pages.

You need a GPU with at least 8GB VRAM. Install everything first:

1
pip install diffusers transformers accelerate torch controlnet-aux opencv-python pillow

Generate the Source Image

Start by creating a base image with Stable Diffusion. The prompt matters here – you want subjects with clear outlines and distinct shapes. Animals, flowers, castles, and vehicles work great. Avoid prompts that produce busy backgrounds or heavy overlapping textures.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipe.enable_model_cpu_offload()

prompt = "a cute cartoon dragon sitting on a treasure chest, simple background, clear outlines, children's illustration style"
negative_prompt = "blurry, photorealistic, dark, complex background, shading, gradient"

generator = torch.Generator(device="cpu").manual_seed(42)

source_image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
    generator=generator,
).images[0]

source_image.save("source_image.png")

The negative prompt is doing heavy lifting. Telling the model to avoid photorealism, shading, and gradients pushes it toward flat, cartoon-style output – much easier to convert into line art later.

Extract Line Art with ControlNet

Now feed that source image through the lineart ControlNet. The lllyasviel/control_v11p_sd15_lineart model is specifically trained to understand and reproduce line structure. Combined with the lineart preprocessor from controlnet_aux, it extracts clean edges that respect the subject’s form.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import torch
from PIL import Image
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import LineartDetector

# Load source image from previous step
source_image = Image.open("source_image.png")

# Extract lineart conditioning image
lineart_processor = LineartDetector.from_pretrained("lllyasviel/Annotators")
lineart_image = lineart_processor(source_image)
lineart_image.save("lineart_condition.png")

# Load ControlNet lineart model
controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_lineart",
    torch_dtype=torch.float16,
)

# Build the pipeline
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipe.enable_model_cpu_offload()

# Generate clean line art
generator = torch.Generator(device="cpu").manual_seed(42)

line_art = pipe(
    prompt="black and white line art drawing, clean outlines, coloring book page, no shading, white background",
    negative_prompt="color, shading, gradient, filled, photorealistic, gray",
    image=lineart_image,
    num_inference_steps=30,
    guidance_scale=7.5,
    controlnet_conditioning_scale=1.0,
    generator=generator,
).images[0]

line_art.save("line_art_output.png")

The controlnet_conditioning_scale at 1.0 tells the model to follow the lineart structure closely. Drop it to 0.6-0.8 if you want more creative freedom in the output, but for coloring books you typically want tight adherence to the original structure.

Post-Process with OpenCV

The raw ControlNet output often has light gray areas, faint lines, or slight color casts. Coloring book pages need pure black lines on a pure white background. OpenCV handles this cleanup fast.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import cv2
import numpy as np
from PIL import Image

def clean_line_art(input_path: str, output_path: str, threshold: int = 200, line_thickness: int = 1) -> Image.Image:
    """Convert line art to clean black-and-white suitable for coloring books."""
    img = cv2.imread(input_path)

    # Convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Apply binary threshold -- everything above threshold becomes white
    _, binary = cv2.threshold(gray, threshold, 255, cv2.THRESH_BINARY)

    # Optional: thicken lines with morphological dilation
    if line_thickness > 1:
        kernel = np.ones((line_thickness, line_thickness), np.uint8)
        # Invert so lines are white, dilate, then invert back
        inverted = cv2.bitwise_not(binary)
        dilated = cv2.dilate(inverted, kernel, iterations=1)
        binary = cv2.bitwise_not(dilated)

    # Remove small noise dots
    kernel_clean = np.ones((2, 2), np.uint8)
    inverted = cv2.bitwise_not(binary)
    cleaned = cv2.morphologyEx(inverted, cv2.MORPH_OPEN, kernel_clean)
    binary = cv2.bitwise_not(cleaned)

    cv2.imwrite(output_path, binary)
    return Image.fromarray(binary)

cleaned = clean_line_art("line_art_output.png", "coloring_page_final.png", threshold=180, line_thickness=2)
cleaned.show()

The threshold value controls how aggressive the cleanup is. Lower values (150-170) keep more detail but might leave gray artifacts. Higher values (200-220) produce cleaner output but can erase thin lines. Start at 180 and adjust per image.

Batch Generate Multiple Coloring Pages

A single coloring page isn’t a coloring book. Here’s how to generate a full set of pages from a list of prompts, running the entire pipeline end-to-end.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
import os
import torch
import cv2
import numpy as np
from PIL import Image
from diffusers import StableDiffusionPipeline, StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import LineartDetector

# Setup output directory
os.makedirs("coloring_book", exist_ok=True)

# Load models once -- reuse across all pages
sd_pipe = StableDiffusionPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None,
)
sd_pipe.enable_model_cpu_offload()

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_lineart",
    torch_dtype=torch.float16,
)
cn_pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    safety_checker=None,
)
cn_pipe.enable_model_cpu_offload()

lineart_processor = LineartDetector.from_pretrained("lllyasviel/Annotators")

# Coloring book subjects
subjects = [
    "a friendly elephant wearing a party hat, simple cartoon style",
    "a medieval castle on a hill with a moat, children's illustration",
    "a rocket ship flying past planets and stars, cartoon style",
    "a butterfly with detailed wing patterns, symmetrical design",
    "a pirate ship sailing on waves, simple cartoon style",
]

negative = "blurry, photorealistic, dark, complex background, shading, gradient"
line_negative = "color, shading, gradient, filled, photorealistic, gray"

for i, subject in enumerate(subjects):
    print(f"Generating page {i + 1}/{len(subjects)}: {subject[:50]}...")

    generator = torch.Generator(device="cpu").manual_seed(42 + i)

    # Step 1: Generate source image
    source = sd_pipe(
        prompt=subject,
        negative_prompt=negative,
        num_inference_steps=25,
        guidance_scale=7.5,
        generator=generator,
    ).images[0]

    # Step 2: Extract line art conditioning
    lineart_cond = lineart_processor(source)

    # Step 3: Generate clean line art with ControlNet
    generator = torch.Generator(device="cpu").manual_seed(42 + i)
    line_art = cn_pipe(
        prompt="black and white line art, clean outlines, coloring book page, no shading, white background",
        negative_prompt=line_negative,
        image=lineart_cond,
        num_inference_steps=25,
        guidance_scale=7.5,
        controlnet_conditioning_scale=1.0,
        generator=generator,
    ).images[0]

    # Step 4: Post-process with OpenCV
    line_art_np = np.array(line_art)
    gray = cv2.cvtColor(line_art_np, cv2.COLOR_RGB2GRAY)
    _, binary = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY)

    # Thicken lines slightly
    inverted = cv2.bitwise_not(binary)
    kernel = np.ones((2, 2), np.uint8)
    dilated = cv2.dilate(inverted, kernel, iterations=1)
    final = cv2.bitwise_not(dilated)

    output_path = f"coloring_book/page_{i + 1:02d}.png"
    cv2.imwrite(output_path, final)
    print(f"  Saved: {output_path}")

print(f"Done. Generated {len(subjects)} coloring pages.")

This loads the models once and reuses them across all pages. Each page takes roughly 15-30 seconds on a modern GPU. For a 20-page coloring book, expect about 5-10 minutes total.

Tips for Better Coloring Pages

Prompt engineering matters more than model tuning. Add “simple background”, “clear outlines”, and “children’s illustration style” to your source prompts. These push the base model toward flat, well-separated shapes that convert cleanly to line art.

Symmetrical subjects work best. Butterflies, mandalas, flowers, and front-facing animals produce cleaner line art than complex action scenes or perspective shots.

Adjust conditioning scale per subject. Detailed subjects like castles benefit from controlnet_conditioning_scale=1.0. Simpler subjects like a single flower might look better at 0.7-0.8 to avoid over-constraining the generation.

Use Canny as an alternative preprocessor. If the lineart detector produces too many fine details, swap it for Canny edge detection with a high threshold. This gives you thicker, bolder edges:

1
2
3
4
from controlnet_aux import CannyDetector

canny = CannyDetector()
edge_image = canny(source_image, low_threshold=100, high_threshold=250)

You would then use the Canny ControlNet model (lllyasviel/sd-controlnet-canny) instead of the lineart one.

Common Errors and Fixes

RuntimeError: Expected all tensors to be on the same device – This happens when the ControlNet and base model end up on different devices. Use pipe.enable_model_cpu_offload() instead of manually calling .to("cuda"). The offload method handles device placement automatically.

ValueError: Cannot handle this data type – OpenCV and PIL use different color channel orders (BGR vs RGB). When passing a PIL image to OpenCV, convert it first: cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR). When going the other direction, use Image.fromarray(cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB)).

OutOfMemoryError: CUDA out of memory – Loading two pipelines (SD + ControlNet) eats VRAM fast. Three fixes in order of preference:

Use enable_model_cpu_offload() on both pipelines
Delete the first pipeline before creating the second: del sd_pipe; torch.cuda.empty_cache()
Add enable_attention_slicing() to reduce peak memory

Lines are too faint after thresholding – Lower your threshold value. If you’re using 200, try 150 or 160. You can also increase line_thickness in the post-processing step to bulk up thin lines.

ControlNet output ignores the conditioning image – Check that controlnet_conditioning_scale isn’t set too low. At 0.0, the model ignores the conditioning entirely. Set it to 0.8-1.0 for coloring book work. Also verify that the conditioning image is the same resolution as the output (default 512x512 for SD 1.5).

FileNotFoundError when loading lllyasviel/Annotators – This model downloads preprocessor weights from Hugging Face. If you’re behind a firewall or have no internet, download the weights manually and pass the local path: LineartDetector.from_pretrained("/path/to/local/annotators").

The Pipeline at a Glance#

Generate the Source Image#

Extract Line Art with ControlNet#

Post-Process with OpenCV#

Batch Generate Multiple Coloring Pages#

Tips for Better Coloring Pages#

Common Errors and Fixes#

Related Guides#

About the Author