How to Build AI Wireframe to UI Generation with Diffusion Models

Designers sketch wireframes on whiteboards and napkins. Turning those rough layouts into polished UI mockups normally means hours in Figma. ControlNet with Stable Diffusion short-circuits that workflow – feed it a wireframe image, and it generates a styled UI design that respects the original layout. The structural lines stay where you drew them, but the model fills in colors, typography styling, gradients, and component details.

Here’s what you need installed:

1
pip install diffusers transformers accelerate torch controlnet-aux pillow

And here’s the minimal pipeline that converts a wireframe sketch into a styled UI screen:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16,
)

pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

wireframe = load_image("wireframe_dashboard.png")
canny = CannyDetector()
edges = canny(wireframe, low_threshold=30, high_threshold=100)

result = pipe(
    prompt="modern web dashboard UI design, clean layout, card components, "
           "sidebar navigation, data visualization charts, light theme, "
           "professional SaaS interface, Dribbble quality, sharp detail",
    negative_prompt="blurry, sketch, hand-drawn, low quality, watermark, text artifacts",
    image=edges,
    num_inference_steps=30,
    controlnet_conditioning_scale=0.9,
).images[0]

result.save("generated_dashboard_ui.png")

The Canny edge detector pulls the structural lines out of your wireframe – boxes become cards, lines become dividers, rectangles become navigation bars. The diffusion model then renders a polished UI that follows those exact boundaries. Lower the Canny thresholds compared to photo use cases, because wireframes have thinner, lighter lines than natural images.

Choosing the Right Conditioning Mode

Canny edges are the default, but wireframes have specific characteristics that make other preprocessors worth considering.

Canny works best for wireframes drawn with thick markers or digital tools that produce clean, high-contrast lines. Set low_threshold between 20-50 and high_threshold between 80-150. Too aggressive and you lose thin UI lines. Too loose and you pick up paper texture noise.

Lineart is better for pencil sketches and hand-drawn wireframes. It’s trained on artistic line work, so it handles varying stroke widths and light pencil marks that Canny would miss entirely.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from controlnet_aux import LineartDetector
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

lineart = LineartDetector.from_pretrained("lllyasviel/Annotators")
lineart_image = lineart(wireframe)

controlnet_lineart = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_lineart",
    torch_dtype=torch.float16,
)

pipe_lineart = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet_lineart,
    torch_dtype=torch.float16,
)
pipe_lineart.enable_model_cpu_offload()

result = pipe_lineart(
    prompt="mobile app login screen, modern UI, rounded input fields, "
           "gradient button, social login icons, clean white background, "
           "iOS style interface, professional app design",
    negative_prompt="blurry, sketch, wireframe, low quality, deformed, watermark",
    image=lineart_image,
    num_inference_steps=30,
    controlnet_conditioning_scale=0.8,
).images[0]

result.save("login_screen_ui.png")

My recommendation: start with Canny for digital wireframes and lineart for anything hand-drawn. Preview the preprocessed image before feeding it to the pipeline – if you can’t recognize your wireframe layout in the edge map, the model won’t either.

Tuning Conditioning Scale and Guidance

Two parameters control the balance between faithfulness to your wireframe and the model’s creative freedom.

controlnet_conditioning_scale (0.0 to 1.5) controls how strictly the output follows your wireframe structure. For UI generation, you want this high – between 0.8 and 1.0. Below 0.7, the model starts ignoring your layout. Above 1.1, it over-constrains and produces artifacts around edges.

guidance_scale (1.0 to 20.0) controls how closely the model follows your text prompt. For UI designs, 7.5 to 10.0 works well. Higher values produce sharper, more literal interpretations of your prompt but can introduce color banding and over-saturation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector
from PIL import Image

controlnet = ControlNetModel.from_pretrained(
    "lllyasviel/control_v11p_sd15_canny",
    torch_dtype=torch.float16,
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)
pipe.enable_model_cpu_offload()

wireframe = load_image("wireframe_settings.png")
canny = CannyDetector()
edges = canny(wireframe, low_threshold=30, high_threshold=100)

prompt = (
    "settings page UI design, toggle switches, dropdown menus, "
    "section headers, clean typography, modern flat design, "
    "subtle shadows, light gray background, professional interface"
)
negative_prompt = (
    "blurry, sketch, hand-drawn, low quality, watermark, "
    "photorealistic, photograph, 3d render"
)

# Generate variations with different conditioning scales
scales = [0.7, 0.85, 1.0]
images = []

for scale in scales:
    generator = torch.Generator(device="cpu").manual_seed(42)
    result = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        image=edges,
        num_inference_steps=30,
        controlnet_conditioning_scale=scale,
        guidance_scale=9.0,
        generator=generator,
    ).images[0]
    images.append(result)
    result.save(f"settings_ui_scale_{scale}.png")

# Create side-by-side comparison
total_width = sum(img.width for img in images)
comparison = Image.new("RGB", (total_width, images[0].height))
x_offset = 0
for img in images:
    comparison.paste(img, (x_offset, 0))
    x_offset += img.width

comparison.save("conditioning_scale_comparison.png")

Start at 0.85 and adjust from there. If the output ignores your wireframe layout, go up. If it looks too mechanical with hard edges following every pencil stroke, go down.

Upgrading to SDXL for Higher Quality

SD 1.5 works for quick iterations, but SDXL produces significantly sharper UI renders at 1024x1024 native resolution. The dual text encoder handles complex UI-related prompts more accurately, and the higher resolution means UI text placeholders and small icons actually look like real interface elements instead of blurred blobs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import torch
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel
from diffusers.utils import load_image
from controlnet_aux import CannyDetector

controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16,
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()

wireframe = load_image("wireframe_ecommerce.png").resize(
    (1024, 1024)
)
canny = CannyDetector()
edges = canny(wireframe, low_threshold=30, high_threshold=100)

result = pipe(
    prompt="e-commerce product page UI, product image hero, price tag, "
           "add to cart button, star rating, product description, "
           "clean modern design, Shopify quality, white background, "
           "professional web interface, high resolution",
    negative_prompt="blurry, sketch, wireframe, low quality, deformed, "
                    "watermark, photorealistic, photograph",
    image=edges,
    num_inference_steps=35,
    controlnet_conditioning_scale=0.85,
    guidance_scale=8.5,
    generator=torch.Generator(device="cpu").manual_seed(55),
).images[0]

result.save("ecommerce_ui_sdxl.png")

SDXL needs at least 12GB VRAM with fp16 and enable_model_cpu_offload(). If you’re on a smaller GPU, stick with SD 1.5 – the quality difference isn’t worth fighting OOM errors. Resize your wireframe to 1024x1024 before preprocessing. Extreme aspect ratio mismatches between input and output produce layout distortions.

Post-Processing for Clean Results

Raw diffusion output needs cleanup before it’s useful in a design workflow. The model sometimes adds noise artifacts, slightly misaligned elements, and unwanted texture in what should be flat color areas.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from PIL import Image, ImageFilter, ImageEnhance

raw_output = Image.open("generated_dashboard_ui.png")

# Sharpen to crisp up UI edges and text-like elements
sharpened = raw_output.filter(ImageFilter.SHARPEN)

# Boost contrast slightly to make UI elements pop
enhancer = ImageEnhance.Contrast(sharpened)
enhanced = enhancer.enhance(1.15)

# Increase color saturation for vibrant UI colors
color_enhancer = ImageEnhance.Color(enhanced)
final = color_enhancer.enhance(1.1)

final.save("dashboard_ui_polished.png")

# For a cleaner look, crop to standard screen dimensions
cropped = final.resize((1440, 900), Image.LANCZOS)
cropped.save("dashboard_ui_1440x900.png")

For production use, export at common screen dimensions (1440x900 for desktop, 375x812 for mobile) so the mockups drop straight into your design tool. If you need the output as a Figma-ready asset, consider running it through rembg to isolate individual UI components on transparent backgrounds.

Common Errors and Fixes

RuntimeError: Expected all tensors to be on the same device – This shows up when you use .to("cuda") instead of enable_model_cpu_offload(). The control image stays on CPU while the model is on GPU. Switch to pipe.enable_model_cpu_offload() and let it handle device placement automatically.

Output looks nothing like the wireframe – Your controlnet_conditioning_scale is too low, or your Canny thresholds are too aggressive. Save and inspect the edge map before passing it to the pipeline. If the edge map is mostly black with barely visible lines, lower low_threshold to 20 and high_threshold to 80. If you see too much noise, raise them.

torch.cuda.OutOfMemoryError: CUDA out of memory – SDXL ControlNet at 1024x1024 needs ~14GB VRAM. Add these memory optimizations:

1
2
3
pipe.enable_model_cpu_offload()
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

If that’s still not enough, drop to SD 1.5 at 512x512 or use torch_dtype=torch.float16 (never run in float32 for inference).

Generated UI has garbled or unreadable text – Diffusion models can’t reliably render actual text. Don’t expect readable labels or button text. Treat the output as a visual mockup for layout and color. Add real text in Figma or Photoshop afterward.

ValueError: The size of tensor a (X) must match the size of tensor b (Y) – Your control image dimensions don’t match the pipeline’s expected output size. The pipeline auto-resizes, but extreme mismatches cause this error. Resize your wireframe to the target output dimensions before running the Canny detector:

1
2
3
4
from PIL import Image

wireframe = Image.open("tiny_sketch.png")
wireframe = wireframe.resize((512, 512), Image.LANCZOS)

Colors look washed out or over-saturated – Adjust guidance_scale. Values above 12.0 tend to produce banding and over-saturated gradients. Stay between 7.5 and 10.0 for natural-looking UI color palettes. Adding “vibrant colors” or “muted palette” to your prompt gives you more direct control than cranking guidance.

Choosing the Right Conditioning Mode#

Tuning Conditioning Scale and Guidance#

Upgrading to SDXL for Higher Quality#

Post-Processing for Clean Results#

Common Errors and Fixes#

Related Guides#

About the Author

Choosing the Right Conditioning Mode

Tuning Conditioning Scale and Guidance

Upgrading to SDXL for Higher Quality

Post-Processing for Clean Results

Common Errors and Fixes

Related Guides