How to Build AI Pixel Art Generation with Stable Diffusion

Stable Diffusion generates photorealistic images by default, but with the right LoRA and some post-processing, it produces crisp pixel art that rivals hand-drawn sprites. The trick is pairing a pixel art LoRA like nerijs/pixel-art-xl with proper downscaling to get that authentic retro look.

Here’s the fastest path to generating pixel art with SDXL and the diffusers library:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import torch
from diffusers import DiffusionPipeline, LCMScheduler

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
lcm_lora_id = "latent-consistency/lcm-lora-sdxl"
pixel_lora_id = "nerijs/pixel-art-xl"

pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16", torch_dtype=torch.float16)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.load_lora_weights(lcm_lora_id, adapter_name="lcm")
pipe.load_lora_weights(pixel_lora_id, adapter_name="pixel")
pipe.set_adapters(["lcm", "pixel"], adapter_weights=[1.0, 1.2])
pipe.to("cuda")

image = pipe(
    prompt="pixel, a warrior character with sword and shield, side view, transparent background",
    negative_prompt="3d render, realistic, blurry, photograph",
    num_inference_steps=8,
    guidance_scale=1.5,
).images[0]

image.save("pixel_warrior.png")

That generates a 1024x1024 image with a strong pixel art style in about 2 seconds on an RTX 4090. The LCM LoRA cuts generation down to 8 steps. Without it, you’d need 25-30 steps with a standard scheduler.

Loading Pixel Art LoRA Weights

The nerijs/pixel-art-xl LoRA is the best open-source option for SDXL pixel art. It needs no trigger word – just include “pixel” in your prompt for stronger style adherence. A few things to know about loading it:

The LoRA stacks cleanly with other adapters through set_adapters. The adapter weight of 1.2 pushes the pixel style harder than the default 1.0. Go higher than 1.4 and you’ll start getting artifacts.

If you don’t need the speed boost from LCM, skip that LoRA and use the standard DDPM scheduler:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16",
    torch_dtype=torch.float16,
).to("cuda")

pipe.load_lora_weights("nerijs/pixel-art-xl", adapter_name="pixel")
pipe.set_adapters(["pixel"], adapter_weights=[1.2])

image = pipe(
    prompt="pixel, a treasure chest open with gold coins, game item sprite",
    negative_prompt="3d render, realistic, photograph, blurry",
    num_inference_steps=30,
    guidance_scale=7.5,
).images[0]

This takes longer but gives you finer control over the output through guidance scale. At 30 steps with guidance 7.5, the details are sharper – useful when you need clean edges on small sprites.

Prompt Engineering for Pixel Art

Prompting for pixel art differs from standard image generation. You want to be explicit about the visual style and suppress photorealism hard.

Positive prompt structure:

1
pixel, [subject description], [view angle], [background], [color palette hint]

Strong negative prompts matter. Without them, the model drifts toward semi-realistic rendering even with the LoRA active:

1
3d render, realistic, photograph, blurry, soft, smooth, anti-aliased, gradient

Here are prompts that work well for common game asset types:

Asset Type	Prompt
Character sprite	`pixel, a knight character idle pose, side view, transparent background, 32x32 style`
Environment tile	`pixel, grass tile with small flowers, top-down view, seamless, 16-color palette`
Item icon	`pixel, red health potion bottle, centered, dark background, retro game style`
Enemy sprite	`pixel, slime monster bouncing, front view, green translucent, simple shape`
UI element	`pixel, wooden treasure chest, isometric view, warm colors, clean edges`

Including resolution hints like “32x32 style” or “16x16 style” pushes the model toward chunkier, more traditional pixel art. “Isometric view” works well for tiles that need depth without perspective distortion.

Generating Sprite Sheets and Tilesets

Generating a full sprite sheet in one pass is unreliable – the model doesn’t understand grid layouts consistently. The better approach is generating individual frames and stitching them with PIL.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import torch
from diffusers import DiffusionPipeline, LCMScheduler
from PIL import Image

model_id = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(model_id, variant="fp16", torch_dtype=torch.float16)
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl", adapter_name="lcm")
pipe.load_lora_weights("nerijs/pixel-art-xl", adapter_name="pixel")
pipe.set_adapters(["lcm", "pixel"], adapter_weights=[1.0, 1.2])
pipe.to("cuda")

# Generate 4 animation frames with a shared seed base for consistency
base_seed = 42
directions = ["facing right idle", "facing right walking step 1", "facing right walking step 2", "facing right attacking"]
frames = []

for i, action in enumerate(directions):
    generator = torch.Generator("cuda").manual_seed(base_seed + i)
    img = pipe(
        prompt=f"pixel, a warrior character {action}, side view, transparent background, 32x32 style, retro game sprite",
        negative_prompt="3d render, realistic, blurry, photograph, multiple characters",
        num_inference_steps=8,
        guidance_scale=1.5,
        generator=generator,
    ).images[0]
    frames.append(img)

# Stitch into a horizontal sprite sheet
sheet_width = frames[0].width * len(frames)
sheet_height = frames[0].height
sprite_sheet = Image.new("RGBA", (sheet_width, sheet_height))

for i, frame in enumerate(frames):
    sprite_sheet.paste(frame, (i * frame.width, 0))

sprite_sheet.save("warrior_spritesheet.png")

For tilesets, generate each tile type separately and assemble them in a grid. Use torch.Generator with sequential seeds to maintain some visual consistency across tiles, though you’ll still need to manually check that they tile properly.

Post-Processing: Downscaling and Palette Reduction

Raw SDXL output is 1024x1024 with millions of colors. Real pixel art uses tiny resolutions and limited palettes. Post-processing bridges that gap.

The key is nearest-neighbor downscaling. Bilinear or bicubic interpolation blurs the pixel edges and ruins the look.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from PIL import Image
import numpy as np

def postprocess_pixel_art(image_path, target_size=64, num_colors=16):
    """Downscale and reduce palette for authentic pixel art."""
    img = Image.open(image_path).convert("RGBA")

    # Step 1: Downscale with nearest-neighbor (preserves hard edges)
    img_small = img.resize((target_size, target_size), Image.NEAREST)

    # Step 2: Quantize to a limited color palette
    # Convert to RGB for quantization, preserve alpha separately
    alpha = img_small.split()[3]
    img_rgb = img_small.convert("RGB")
    img_quantized = img_rgb.quantize(colors=num_colors, method=Image.MEDIANCUT)
    img_result = img_quantized.convert("RGB")

    # Step 3: Recombine with alpha channel
    img_result = img_result.convert("RGBA")
    img_result.putalpha(alpha)

    return img_result


# Process a generated image
result = postprocess_pixel_art("pixel_warrior.png", target_size=32, num_colors=8)
result.save("pixel_warrior_32x32.png")

# For a tile that should be larger
tile = postprocess_pixel_art("grass_tile.png", target_size=64, num_colors=16)
tile.save("grass_tile_64x64.png")

Common target sizes for game assets:

16x16 – Classic NES-era icons and small tiles
32x32 – Standard character sprites and items (most versatile)
64x64 – Detailed characters or larger environment tiles
128x128 – Portraits, large UI elements

For palette reduction, 8 colors gives you a strict retro feel. 16 is the sweet spot for most game art. Going above 32 and you lose the pixel art aesthetic.

If you want to enforce a specific palette (like the classic PICO-8 palette), map each pixel to its nearest color:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import numpy as np
from PIL import Image

# PICO-8 16-color palette
PICO8_PALETTE = np.array([
    [0, 0, 0], [29, 43, 83], [126, 37, 83], [0, 135, 81],
    [171, 82, 54], [95, 87, 79], [194, 195, 199], [255, 241, 232],
    [255, 0, 77], [255, 163, 0], [255, 236, 39], [0, 228, 54],
    [41, 173, 255], [131, 118, 156], [255, 119, 168], [255, 204, 170],
], dtype=np.uint8)


def apply_palette(image, palette):
    """Map every pixel to the closest color in a fixed palette."""
    arr = np.array(image.convert("RGB"), dtype=np.float32)
    h, w, _ = arr.shape
    flat = arr.reshape(-1, 3)

    # Find nearest palette color for each pixel (Euclidean distance)
    distances = np.linalg.norm(flat[:, None, :] - palette[None, :, :], axis=2)
    indices = np.argmin(distances, axis=1)
    result = palette[indices].reshape(h, w, 3)

    return Image.fromarray(result.astype(np.uint8))


img = Image.open("pixel_warrior_32x32.png")
pico8_art = apply_palette(img, PICO8_PALETTE)
pico8_art.save("warrior_pico8.png")

This gives you game-ready sprites that match classic console color constraints.

Common Errors and Fixes

“RuntimeError: Expected all tensors to be on the same device”

This happens when you call pipe.to("cuda") before loading LoRA weights. Load all LoRAs first, then move the pipeline to GPU:

1
2
3
4
5
6
7
# Wrong order
pipe.to("cuda")
pipe.load_lora_weights(...)  # Fails

# Correct order
pipe.load_lora_weights(...)
pipe.to("cuda")

Black or completely noisy output images

Usually a VAE issue with SDXL. Use the fixed fp16 VAE:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from diffusers import AutoencoderKL

vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16,
)
pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    variant="fp16",
    torch_dtype=torch.float16,
)

LoRA has no visible effect on the output

Check that you called set_adapters after loading. Just calling load_lora_weights isn’t always enough – explicitly activate the adapter:

1
2
pipe.load_lora_weights("nerijs/pixel-art-xl", adapter_name="pixel")
pipe.set_adapters(["pixel"], adapter_weights=[1.2])  # Don't skip this

Out of memory on GPUs with less than 12GB VRAM

Enable attention slicing and sequential CPU offload:

1
2
pipe.enable_attention_slicing()
pipe.enable_sequential_cpu_offload()

This trades speed for memory. Generation takes roughly 3x longer but runs on 6GB cards.

Blurry output that doesn’t look like pixel art

Your negative prompt is probably too weak. Make sure you’re blocking anti-aliasing and soft rendering:

1
negative_prompt="3d render, realistic, blurry, soft, smooth, anti-aliased, gradient, photograph, photorealistic"

Also check your LoRA weight – below 0.8 and the pixel style barely registers.

Loading Pixel Art LoRA Weights#

Prompt Engineering for Pixel Art#

Generating Sprite Sheets and Tilesets#

Post-Processing: Downscaling and Palette Reduction#

Common Errors and Fixes#

Related Guides#

About the Author

Loading Pixel Art LoRA Weights

Prompt Engineering for Pixel Art

Generating Sprite Sheets and Tilesets

Post-Processing: Downscaling and Palette Reduction

Common Errors and Fixes

Related Guides