ControlNet Scribble takes a rough hand-drawn sketch and uses it as structural guidance for Stable Diffusion, producing a fully rendered image that follows your sketch’s composition. You draw a stick figure cat, add a prompt like “a fluffy orange cat sitting on a windowsill,” and the model fills in the rest. Here’s the minimal setup to get it running:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-scribble",
    torch_dtype=torch.float16,
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

sketch = load_image("https://huggingface.co/lllyasviel/sd-controlnet-scribble/resolve/main/images/bag.png")

result = pipe(
    prompt="a luxury leather handbag, product photography, studio lighting",
    image=sketch,
    num_inference_steps=30,
).images[0]

result.save("output.png")

That’s the core loop. Load the ControlNet scribble checkpoint, wire it into a Stable Diffusion pipeline, pass your sketch as the conditioning image, and you get a detailed render guided by the sketch’s structure.

Preprocessing Your Sketches

Real-world sketches rarely come in the exact format ControlNet expects. The scribble model works best with white lines on a black background (or inverted, depending on the preprocessor). The controlnet_aux package includes a HEDdetector that can extract scribble-like edges from any image, but if you’re starting from an actual hand-drawn sketch, you often just need to clean it up.

1
pip install controlnet_aux diffusers transformers accelerate

Here’s how to preprocess a sketch from a photo or scan:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from controlnet_aux import HEDdetector
from diffusers.utils import load_image
from PIL import Image, ImageOps

hed = HEDdetector.from_pretrained("lllyasviel/Annotators")

# Option 1: Extract scribble edges from a photo
input_image = load_image("my_photo.png")
scribble = hed(input_image, scribble=True)
scribble.save("scribble_preprocessed.png")

# Option 2: Clean up a hand-drawn sketch on white paper
sketch = Image.open("hand_drawn.png").convert("L")
# Threshold to pure black and white
sketch = sketch.point(lambda x: 0 if x < 128 else 255, "1")
# ControlNet scribble expects white strokes on black background
sketch = ImageOps.invert(sketch.convert("L")).convert("RGB")
sketch = sketch.resize((512, 512))
sketch.save("cleaned_sketch.png")

The key detail: if your sketch is black lines on white paper (the normal way people draw), you need to invert it. ControlNet scribble was trained on white-on-black scribble maps. Getting this wrong produces washed-out results where the model ignores your sketch entirely.

Controlling Sketch Fidelity vs. Creative Freedom

The controlnet_conditioning_scale parameter is the most important knob you have. It ranges from 0.0 to 2.0, and it controls how strictly the output follows your sketch.

  • 0.3-0.5: Loose guidance. The model takes your sketch as a rough suggestion and fills in details freely. Good for messy doodles where you want the AI to interpret liberally.
  • 0.7-1.0: Balanced. The output follows the sketch structure but still has room to add detail and texture. This is the sweet spot for most use cases.
  • 1.2-1.5: Strict adherence. The output closely mirrors every line in your sketch. Useful for architectural plans or precise compositions.
  • Above 1.5: Artifacts start appearing. The model over-constrains and you get distorted textures.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-scribble",
    torch_dtype=torch.float16,
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

sketch = load_image("cleaned_sketch.png").resize((512, 512))

# Generate at three fidelity levels for comparison
for scale in [0.4, 0.8, 1.3]:
    generator = torch.Generator(device="cpu").manual_seed(42)
    result = pipe(
        prompt="a cozy cottage in a forest clearing, watercolor style, soft lighting",
        negative_prompt="blurry, low quality, distorted",
        image=sketch,
        num_inference_steps=30,
        controlnet_conditioning_scale=scale,
        generator=generator,
    ).images[0]
    result.save(f"output_scale_{scale}.png")
    print(f"Saved output at conditioning scale {scale}")

Notice the fixed seed with torch.Generator so the only variable across outputs is the conditioning scale. This makes it easy to compare how tightly the model follows your sketch at each level.

Swapping in UniPCMultistepScheduler cuts inference steps without sacrificing quality. You can drop to 20 steps with this scheduler and still get clean results, which matters when you’re iterating on sketches.

Batch Processing Multiple Sketches with Different Styles

When you have a set of sketches and want to render each one in multiple styles, structure it as a batch job. The pipeline supports batch inference, but the simpler approach for different prompts per sketch is to loop and reuse the loaded model.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import torch
from pathlib import Path
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from PIL import Image, ImageOps

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/sd-controlnet-scribble",
    torch_dtype=torch.float16,
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()

sketch_dir = Path("sketches")
output_dir = Path("renders")
output_dir.mkdir(exist_ok=True)

styles = {
    "photorealistic": "photorealistic, 8k, detailed textures, natural lighting",
    "anime": "anime style, vibrant colors, cel shading, studio ghibli",
    "oil_painting": "oil painting, thick brushstrokes, impressionist, gallery quality",
}

for sketch_path in sorted(sketch_dir.glob("*.png")):
    # Load and preprocess
    sketch = Image.open(sketch_path).convert("L")
    sketch = sketch.point(lambda x: 0 if x < 128 else 255, "1")
    sketch = ImageOps.invert(sketch.convert("L")).convert("RGB")
    sketch = sketch.resize((512, 512))

    for style_name, style_prompt in styles.items():
        generator = torch.Generator(device="cpu").manual_seed(42)
        result = pipe(
            prompt=f"a detailed scene, {style_prompt}",
            negative_prompt="blurry, low quality, deformed, ugly",
            image=sketch,
            num_inference_steps=25,
            controlnet_conditioning_scale=0.8,
            generator=generator,
        ).images[0]

        out_name = f"{sketch_path.stem}_{style_name}.png"
        result.save(output_dir / out_name)
        print(f"Rendered {out_name}")

print(f"All renders saved to {output_dir}/")

This processes every PNG in a sketches/ folder and renders it in three different artistic styles. The fixed seed ensures consistency across styles for the same sketch, so you can directly compare how each style interprets the same composition.

For true batch inference (multiple images in one forward pass), you can pass lists to the pipeline, but all images in the batch need the same dimensions and you’re limited by VRAM. For most sketch-to-image workflows, sequential processing with enable_model_cpu_offload() is more practical.

Common Errors and Fixes

RuntimeError: Expected all tensors to be on the same device – This happens when the ControlNet model and the main pipeline end up on different devices. Use pipe.enable_model_cpu_offload() instead of manually calling .to("cuda") on individual components. CPU offloading handles device placement automatically.

Output ignores the sketch completely – Almost always a preprocessing issue. Check that your sketch is white lines on a black background. Open your preprocessed image and verify visually. If it’s black-on-white, add ImageOps.invert() before passing it to the pipeline.

torch.cuda.OutOfMemoryError – ControlNet adds significant VRAM overhead on top of Stable Diffusion. On an 8GB GPU, enable attention slicing and CPU offload:

1
2
3
4
pipe.enable_model_cpu_offload()
pipe.enable_attention_slicing()
# For extreme memory constraints, also enable VAE slicing
pipe.enable_vae_slicing()

This drops peak VRAM usage from around 10GB to under 5GB, at the cost of slower inference.

Artifacts at high conditioning scales – If you see grid-like patterns or color banding, lower controlnet_conditioning_scale below 1.2. Values above 1.5 almost always produce artifacts. Also try adding detail-oriented terms to your negative prompt: "artifacts, grid pattern, color banding, oversaturated".

Blurry or low-detail outputs – Increase num_inference_steps to 40-50. Also check that you’re not accidentally resizing your sketch to a very small resolution before upscaling. The pipeline works best with 512x512 input for SD 1.5 based models.

ValueError: Expected image to have 3 channels – Your sketch image is grayscale or RGBA. Convert it before passing to the pipeline:

1
sketch = Image.open("sketch.png").convert("RGB").resize((512, 512))