The Quick Version

Inpainting lets you select a region of an image with a mask and replace it with something new, guided by a text prompt. The masked area (white pixels) gets regenerated while everything else stays untouched.

1
pip install diffusers transformers accelerate torch pillow
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipe = AutoPipelineForInpainting.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe.enable_model_cpu_offload()

# Load your image and a mask (white = area to replace, black = keep)
init_image = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png").resize((512, 512))
mask_image = load_image("https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png").resize((512, 512))

prompt = "a golden retriever sitting on a park bench"
negative_prompt = "bad anatomy, deformed, ugly, blurry"

result = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=init_image,
    mask_image=mask_image,
    num_inference_steps=30,
    guidance_scale=7.5,
).images[0]

result.save("inpainted.png")

That replaces the masked region with a golden retriever while keeping the rest of the photo intact. The AutoPipelineForInpainting class auto-detects the right pipeline for whichever checkpoint you load, so it works across SD 1.5, SDXL, and Kandinsky models.

Creating Masks Programmatically

You will not always have a premade mask. For automation or batch workflows, build masks in code with Pillow’s ImageDraw:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from PIL import Image, ImageDraw

# Create a blank mask (same size as your source image)
width, height = 512, 512
mask = Image.new("L", (width, height), 0)  # all black = keep everything
draw = ImageDraw.Draw(mask)

# Draw a white rectangle over the area you want to replace
# (left, top, right, bottom)
draw.rectangle((150, 100, 350, 300), fill=255)

# Or use an ellipse for organic shapes
# draw.ellipse((150, 100, 350, 300), fill=255)

mask.save("mask.png")

White pixels (255) mark the region the model will regenerate. Black pixels (0) stay as-is. Grayscale values between 0 and 255 blend between keeping and replacing, but in practice most workflows use pure black and white.

For real-world use, you will often generate masks from a segmentation model (like SAM) or a simple threshold on color differences, then pass that mask to the inpainting pipeline.

SDXL Inpainting for Higher Resolution

SD 1.5 inpainting works at 512x512. If you need 1024x1024 output, use the SDXL inpainting checkpoint instead:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import torch
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipe = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
    torch_dtype=torch.float16,
    variant="fp16",
).to("cuda")

image = load_image("your_photo.png").resize((1024, 1024))
mask = load_image("your_mask.png").resize((1024, 1024))

generator = torch.Generator(device="cuda").manual_seed(42)

result = pipe(
    prompt="a Japanese zen garden with raked sand and moss-covered stones",
    image=image,
    mask_image=mask,
    guidance_scale=8.0,
    num_inference_steps=20,
    strength=0.99,
    generator=generator,
).images[0]

result.save("inpainted_xl.png")

Two things to watch with SDXL inpainting. Keep strength below 1.0 – setting it to exactly 1.0 introduces visible noise artifacts due to how the noise scheduling works at that boundary. Values like 0.99 or 0.95 avoid the issue. And stick to 15-30 inference steps; going higher wastes time without meaningful quality gains.

Key Parameters That Actually Matter

strength controls how much the masked region changes. At 1.0 (or 0.99 for SDXL), you get maximum transformation – the model ignores what was there before. At 0.5, it blends heavily with the original content. Use high strength when replacing objects entirely, low strength when doing subtle edits like changing colors or textures.

guidance_scale determines how literally the model follows your prompt. A value of 7.5 is a solid default. Below 5.0, outputs get creative but may ignore your prompt. Above 12.0, you get over-saturated images that look artificial.

negative_prompt steers the model away from unwanted artifacts. Always include basics like "deformed, ugly, blurry, bad anatomy" to catch common failure modes.

padding_mask_crop is an underused parameter that significantly improves quality on small masked regions. It crops around the mask, runs inpainting at full resolution on just that crop, then composites back:

1
2
3
4
5
6
7
result = pipe(
    prompt="a red rose",
    image=init_image,
    mask_image=mask_image,
    strength=0.75,
    padding_mask_crop=32,  # pixels of padding around the mask
).images[0]

This avoids the quality loss you get when the mask covers a tiny portion of a large image. The model dedicates all its resolution to the area that matters.

Blurring Mask Edges

Hard mask edges produce visible seams in the output. The diffusers VaeImageProcessor has a built-in blur method that feathers mask boundaries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from diffusers import AutoPipelineForInpainting
from diffusers.utils import load_image

pipe = AutoPipelineForInpainting.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-inpainting",
    torch_dtype=torch.float16,
).to("cuda")

mask = load_image("sharp_mask.png")
blurred_mask = pipe.mask_processor.blur(mask, blur_factor=33)

result = pipe(
    prompt="lush green grass",
    image=load_image("photo.png"),
    mask_image=blurred_mask,
).images[0]

Higher blur_factor values create softer transitions. Start around 20-40 and adjust based on how visible the boundary is in your output.

Inpaint-Specific vs Regular Checkpoints

You can technically run inpainting with any Stable Diffusion checkpoint, not just the dedicated inpainting models. The AutoPipelineForInpainting handles both. But the results differ:

  • Inpaint-specific checkpoints (stable-diffusion-inpainting, stable-diffusion-xl-1.0-inpainting-0.1) produce cleaner transitions between masked and unmasked areas. They were fine-tuned specifically for this task.
  • Regular checkpoints (stable-diffusion-v1-5, stable-diffusion-xl-base-1.0) preserve the unmasked area more faithfully but often leave visible outlines where the mask boundary was.

Use inpaint-specific checkpoints when you want seamless blending. Use regular checkpoints when preserving the exact original pixels outside the mask matters more than a clean transition.

Common Errors and Fixes

ValueError: images do not match – Your image and mask have different dimensions. Both must be the same width and height. Resize them to match before passing to the pipeline:

1
2
init_image = init_image.resize((512, 512))
mask_image = mask_image.resize((512, 512))

Dimensions not divisible by 8 – Stable Diffusion’s VAE requires image dimensions divisible by 8. A 623x400 image will fail. Round to the nearest multiple of 8:

1
2
3
4
5
w, h = init_image.size
w = (w // 8) * 8
h = (h // 8) * 8
init_image = init_image.resize((w, h))
mask_image = mask_image.resize((w, h))

RuntimeError: CUDA out of memory – SDXL at 1024x1024 needs roughly 10-12GB VRAM. If you are running out, switch from .to("cuda") to enable_model_cpu_offload() which moves model components to CPU when not in use:

1
2
3
4
5
6
pipe = AutoPipelineForInpainting.from_pretrained(
    "diffusers/stable-diffusion-xl-1.0-inpainting-0.1",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe.enable_model_cpu_offload()  # instead of .to("cuda")

This trades some speed for significantly lower VRAM usage. On PyTorch 2.0+, scaled-dot product attention is enabled automatically, which helps further. On older versions, call pipe.enable_xformers_memory_efficient_attention() if you have xFormers installed.

Strength=1.0 produces noisy output with SDXL – This is a known issue with the SDXL inpainting checkpoint. Use strength=0.99 instead.