Architectural floor plan generation used to require expensive CAD software and hours of drafting. Now you can go from a rough sketch or a text description to a clean floor plan in seconds using Stable Diffusion XL and ControlNet. The results aren’t production-ready blueprints, but they’re excellent for rapid prototyping, client presentations, and exploring layout ideas before committing to detailed design work.

Text-to-Floor-Plan with Stable Diffusion XL

The fastest way to generate a floor plan is a well-crafted prompt fed into SDXL. Architecture-specific language matters here – generic prompts produce blurry results. You want to describe the plan as a technical drawing, not a photo.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe.to("cuda")

prompt = (
    "architectural floor plan, top-down view, 2-bedroom apartment, "
    "living room, kitchen, bathroom, clean black lines on white background, "
    "technical drawing style, labeled rooms, precise geometry, no furniture, "
    "blueprint aesthetic, vector line art"
)
negative_prompt = (
    "3d rendering, perspective view, furniture, people, realistic photo, "
    "blurry, low quality, watermark, text overlay"
)

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=40,
    guidance_scale=7.5,
    width=1024,
    height=1024,
).images[0]

image.save("floor_plan_sdxl.png")
print(f"Generated floor plan: {image.size}")

The key to good results is the negative prompt. Without it, SDXL tends to generate 3D renders or perspective views instead of clean top-down plans. The guidance_scale of 7.5 keeps the model close to your prompt without over-saturating the output.

Conditioning on Sketches with ControlNet

Text prompts alone give you limited control over room placement. If you have a rough sketch – even a hand-drawn one on paper – ControlNet with Canny edge detection turns it into a structured floor plan while preserving your layout.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import torch
import cv2
import numpy as np
from PIL import Image
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel

# Load the sketch and extract edges
sketch = cv2.imread("rough_sketch.png", cv2.IMREAD_GRAYSCALE)
edges = cv2.Canny(sketch, threshold1=50, threshold2=150)
control_image = Image.fromarray(edges).resize((1024, 1024))

# Load ControlNet for SDXL with Canny conditioning
controlnet = ControlNetModel.from_pretrained(
    "diffusers/controlnet-canny-sdxl-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
)

pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    controlnet=controlnet,
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe.to("cuda")

prompt = (
    "clean architectural floor plan, top-down view, precise room boundaries, "
    "black lines on white background, labeled rooms, technical drawing, "
    "professional blueprint style"
)
negative_prompt = "3d, perspective, furniture, photo, blurry, watermark"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=control_image,
    num_inference_steps=40,
    guidance_scale=7.5,
    controlnet_conditioning_scale=0.8,
).images[0]

image.save("floor_plan_controlnet.png")

The controlnet_conditioning_scale parameter controls how strictly the model follows your sketch. At 0.8, the model respects the overall layout but smooths out rough edges. Drop it to 0.5 if you want more creative freedom, or push it to 1.0 for near-exact reproduction of your lines.

Post-Processing with OpenCV

Raw generated floor plans usually need cleanup. Wall lines might be uneven, rooms might lack labels, and there’s often noise in the background. OpenCV handles all of this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import cv2
import numpy as np

# Load the generated floor plan
img = cv2.imread("floor_plan_controlnet.png")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Threshold to get clean black-and-white lines
_, binary = cv2.threshold(gray, 200, 255, cv2.THRESH_BINARY)

# Remove small noise with morphological operations
kernel = np.ones((3, 3), np.uint8)
cleaned = cv2.morphologyEx(binary, cv2.MORPH_CLOSE, kernel, iterations=2)
cleaned = cv2.morphologyEx(cleaned, cv2.MORPH_OPEN, kernel, iterations=1)

# Find room contours for labeling
inverted = cv2.bitwise_not(cleaned)
contours, hierarchy = cv2.findContours(
    inverted, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE
)

# Label detected rooms by area
output = cv2.cvtColor(cleaned, cv2.COLOR_GRAY2BGR)
room_labels = ["Living Room", "Bedroom", "Kitchen", "Bathroom", "Hallway"]
room_idx = 0

for contour in contours:
    area = cv2.contourArea(contour)
    if 5000 < area < 200000:  # Filter by reasonable room sizes
        M = cv2.moments(contour)
        if M["m00"] == 0:
            continue
        cx = int(M["m10"] / M["m00"])
        cy = int(M["m01"] / M["m00"])

        label = room_labels[room_idx % len(room_labels)]
        cv2.putText(
            output, label, (cx - 40, cy),
            cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 200), 1
        )
        cv2.drawContours(output, [contour], -1, (0, 180, 0), 2)
        room_idx += 1

cv2.imwrite("floor_plan_labeled.png", output)
print(f"Detected and labeled {room_idx} rooms")

The contour area filter (5000 < area < 200000) is critical. Without it, you’ll label every tiny artifact as a room. Adjust these thresholds based on your image resolution – larger images need larger minimum areas.

Generating Variations and Batch Configurations

Once you have a base prompt that works, you’ll want to explore variations. Different room counts, apartment sizes, and architectural styles. Batch generation with a seed strategy gives you reproducible results you can compare side by side.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe.to("cuda")

configurations = [
    {
        "name": "studio",
        "prompt": "architectural floor plan, top-down, studio apartment, "
                  "open kitchen, one bathroom, minimalist, black lines, white background",
    },
    {
        "name": "2bed",
        "prompt": "architectural floor plan, top-down, 2-bedroom apartment, "
                  "separate kitchen, living room, 2 bathrooms, black lines, white background",
    },
    {
        "name": "3bed_family",
        "prompt": "architectural floor plan, top-down, 3-bedroom family home, "
                  "large kitchen, dining room, 2 bathrooms, garage, black lines, white background",
    },
    {
        "name": "office",
        "prompt": "architectural floor plan, top-down, open office space, "
                  "4 meeting rooms, reception area, break room, black lines, white background",
    },
]

negative_prompt = "3d, perspective, furniture, photo, blurry, watermark, low quality"

for config in configurations:
    generator = torch.Generator(device="cuda").manual_seed(42)

    images = pipe(
        prompt=config["prompt"],
        negative_prompt=negative_prompt,
        num_inference_steps=35,
        guidance_scale=7.5,
        width=1024,
        height=1024,
        num_images_per_prompt=3,
        generator=generator,
    ).images

    for i, img in enumerate(images):
        filename = f"floor_plan_{config['name']}_v{i}.png"
        img.save(filename)
        print(f"Saved {filename}")

Using the same seed (42) across configurations keeps the overall structure consistent while the prompt drives the layout differences. If you want completely different layouts for the same room count, change the seed between runs. Setting num_images_per_prompt=3 gives you three variations per configuration to pick from.

For memory-constrained setups, add pipe.enable_model_cpu_offload() right after loading the pipeline. It moves model components between GPU and CPU as needed, cutting VRAM usage roughly in half at the cost of slower inference.

Common Errors and Fixes

RuntimeError: Expected all tensors to be on the same device – This happens when the control image and model are on different devices. Make sure your control image is a PIL Image (not a tensor) and let the pipeline handle the conversion. Don’t manually move it to CUDA.

OutOfMemoryError on 8GB GPUs – SDXL is hungry. Add pipe.enable_model_cpu_offload() after loading the pipeline. If that’s still not enough, use pipe.enable_sequential_cpu_offload() which is slower but uses minimal VRAM. You can also drop resolution to 768x768, though floor plan quality suffers.

Blurry or incoherent floor plans – Your prompt likely isn’t specific enough. Always include “top-down view”, “black lines on white background”, and “architectural floor plan” together. Adding “technical drawing” and “blueprint” pushes the model toward clean line art instead of artistic interpretations.

ControlNet ignoring your sketch – If the generated plan doesn’t match your input sketch at all, check two things. First, make sure your Canny edge thresholds actually produce visible edges – try cv2.imwrite("edges_debug.png", edges) to inspect. Second, increase controlnet_conditioning_scale toward 1.0. Below 0.5, the model mostly ignores the conditioning.

Rooms not detected in post-processing – The contour detection depends on closed boundaries. If AI-generated walls have gaps, the contour finder won’t see separate rooms. Fix this by increasing the MORPH_CLOSE iterations to 3 or 4, which closes small gaps in wall lines before contour detection runs.

ValueError: Input image must be 1024x1024 – SDXL models expect specific resolutions. Resize your control image to match the pipeline’s output size. Use control_image = control_image.resize((1024, 1024)) before passing it to the pipeline.