What You Need

Stable Diffusion runs locally on your GPU. You need:

  • A GPU with at least 8GB VRAM (RTX 3060 or better)
  • Python 3.10+
  • About 5GB of disk space for the model weights

No API keys, no cloud costs, no usage limits. You own the hardware, you own the outputs.

Install and Generate Your First Image

1
pip install diffusers transformers accelerate torch
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import torch
from diffusers import StableDiffusionPipeline

# Load the model (downloads ~5GB on first run)
pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,  # Half precision — uses less VRAM
)
pipe = pipe.to("cuda")

# Generate an image
prompt = "a futuristic city at sunset, cyberpunk style, neon lights, detailed architecture"
image = pipe(prompt).images[0]

# Save it
image.save("output.png")
print("Image saved to output.png")

First run downloads the model weights from Hugging Face. After that, it loads from the local cache in seconds.

Control the Output

Stable Diffusion has several knobs that dramatically affect the output quality and style.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

image = pipe(
    prompt="a cat wearing a spacesuit on the moon, photorealistic, 8k, detailed",
    negative_prompt="blurry, low quality, distorted, deformed, ugly, bad anatomy",
    num_inference_steps=50,     # More steps = higher quality (default 50)
    guidance_scale=7.5,         # How closely to follow the prompt (7-12 is good)
    width=768,
    height=768,
    generator=torch.Generator("cuda").manual_seed(42),  # Reproducible output
).images[0]

image.save("cat_astronaut.png")
ParameterWhat It DoesGood Range
num_inference_stepsDenoising iterations30-50 (more = slower but better)
guidance_scalePrompt adherence7-12 (higher = more literal)
negative_promptWhat to avoidAlways include quality negatives
generatorRandom seedSet for reproducibility

Negative Prompts

Negative prompts are just as important as positive prompts. They tell the model what artifacts to avoid.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# A solid default negative prompt for photorealistic images
negative = (
    "blurry, low quality, distorted, deformed, ugly, bad anatomy, "
    "bad proportions, extra limbs, duplicate, watermark, signature, "
    "text, logo, cropped, out of frame, worst quality, low resolution"
)

# For artistic/stylized images
negative_art = (
    "photorealistic, photo, 3d render, blurry, low quality, "
    "watermark, signature, text, ugly, deformed"
)

Generate Multiple Images

Generate a batch and pick the best one.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

prompt = "a cozy cabin in the mountains during winter, warm light, snow"
negative = "blurry, low quality, distorted, deformed"

# Generate 4 variations with different seeds
images = []
for seed in [42, 123, 456, 789]:
    generator = torch.Generator("cuda").manual_seed(seed)
    image = pipe(
        prompt=prompt,
        negative_prompt=negative,
        num_inference_steps=40,
        guidance_scale=8.0,
        generator=generator,
    ).images[0]
    images.append(image)
    image.save(f"cabin_seed_{seed}.png")
    print(f"Saved cabin_seed_{seed}.png")

Image-to-Image (Modify Existing Images)

Start from an existing image and modify it with a text prompt.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import torch
from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

# Load your source image
init_image = Image.open("photo.jpg").convert("RGB")
init_image = init_image.resize((768, 768))

# Transform it
image = pipe(
    prompt="same scene but in watercolor painting style, artistic, vibrant colors",
    image=init_image,
    strength=0.7,       # 0.0 = no change, 1.0 = completely new image
    guidance_scale=8.0,
    num_inference_steps=40,
).images[0]

image.save("watercolor_version.png")

The strength parameter controls how much the output deviates from the input. At 0.3, you get subtle style changes. At 0.8, only the composition is preserved.

Memory Optimization

Running out of VRAM? These optimizations can cut memory usage significantly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-1",
    torch_dtype=torch.float16,
)

# Enable attention slicing — trades speed for memory
pipe.enable_attention_slicing()

# Enable VAE slicing for large images
pipe.enable_vae_slicing()

# Or use sequential CPU offloading (slowest but uses least VRAM)
pipe.enable_sequential_cpu_offload()

pipe = pipe.to("cuda")
OptimizationVRAM SavingsSpeed Impact
float16~50%Faster on modern GPUs
enable_attention_slicing()~30%~10% slower
enable_vae_slicing()~20%Minimal
enable_sequential_cpu_offload()~70%2-3x slower

Common Issues

CUDA out of memory. Reduce image size to 512x512, enable attention slicing, or use float16. If you have less than 8GB VRAM, use enable_sequential_cpu_offload().

Black or noisy images. Your guidance_scale might be too high (try 7.5) or num_inference_steps too low (try 40+). Also check that the model downloaded completely — a corrupted download produces garbage.

Faces look wrong. Base Stable Diffusion struggles with faces. Use a model specifically trained for portraits, or add “detailed face, sharp features” to your prompt and “deformed face, bad anatomy” to your negative prompt.

Slow generation. Install xformers for a 20-30% speedup:

1
pip install xformers
1
pipe.enable_xformers_memory_efficient_attention()