How to Build AI Sprite Sheet Generation with Stable Diffusion

Sprite sheets are tedious to make by hand. A single character with 8 walk-cycle frames, 4 directions, idle, attack, and death animations can take an artist days. Stable Diffusion can generate individual sprites in seconds, and with the right prompts and seed control, they stay visually consistent enough to pack into a usable sheet.

Here’s the fast version – generate base sprites with a locked seed and style prompt, use img2img for pose variations, strip backgrounds with rembg, and stitch everything into a grid with PIL.

1
pip install diffusers transformers accelerate torch pillow rembg

You need a CUDA GPU with at least 8GB VRAM for SD 1.5 or 12GB for SDXL. An RTX 3060 handles SD 1.5 fine.

Generate Base Sprites with a Consistent Style

The biggest problem with AI sprite generation is style drift. Generate the same character twice and you get two completely different art styles. The fix is a combination of fixed seeds, detailed style prompts, and a reusable prompt template.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import torch
from diffusers import StableDiffusionPipeline
from PIL import Image

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipe = pipe.to("cuda")
pipe.enable_vae_slicing()

# Style template that locks the art direction
STYLE_SUFFIX = (
    "pixel art style, 16-bit retro game sprite, "
    "clean edges, solid colors, no antialiasing, "
    "transparent background, centered, full body"
)

NEGATIVE_PROMPT = (
    "blurry, photorealistic, 3d render, photograph, "
    "watermark, text, signature, frame, border, "
    "multiple characters, cropped, partial body"
)

BASE_SEED = 42

def generate_sprite(subject: str, seed_offset: int = 0) -> Image.Image:
    """Generate a single sprite with locked style."""
    prompt = f"{subject}, {STYLE_SUFFIX}"
    generator = torch.Generator("cuda").manual_seed(BASE_SEED + seed_offset)

    image = pipe(
        prompt=prompt,
        negative_prompt=NEGATIVE_PROMPT,
        num_inference_steps=30,
        guidance_scale=7.5,
        generator=generator,
        width=512,
        height=512,
    ).images[0]

    return image


# Generate a set of character sprites
subjects = [
    "a knight character in silver armor holding a sword, facing right",
    "a knight character in silver armor holding a shield, facing right",
    "a knight character in silver armor swinging a sword, facing right",
    "a knight character in silver armor walking, facing right",
]

sprites = []
for i, subject in enumerate(subjects):
    sprite = generate_sprite(subject, seed_offset=i)
    sprite.save(f"sprite_{i}.png")
    sprites.append(sprite)
    print(f"Generated sprite {i}: {subject[:50]}...")

The STYLE_SUFFIX is doing the heavy lifting here. Every sprite gets the same art direction baked into the prompt. The seed_offset parameter gives you variation between poses while keeping the base seed close enough that the model’s latent space produces visually related outputs.

A few things matter for consistency:

Same model, same checkpoint – never mix models across a sprite set
Fixed guidance scale – changing this shifts the style noticeably
Identical negative prompt – removing this from even one generation introduces drift
Describe the character identically every time, only varying the pose/action

Create Pose Variations with img2img

Text-to-image gives you a base sprite. For variations – different poses, attack frames, walk cycles – img2img is more reliable. You feed it your base sprite and a new pose description. The model preserves the character’s look while changing the pose.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from diffusers import StableDiffusionImg2ImgPipeline

img2img_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None,
)
img2img_pipe = img2img_pipe.to("cuda")
img2img_pipe.enable_vae_slicing()

# Load your best base sprite as the reference
base_sprite = Image.open("sprite_0.png").convert("RGB").resize((512, 512))

pose_prompts = [
    "a knight character in silver armor, idle stance, facing right",
    "a knight character in silver armor, walking pose mid-step, facing right",
    "a knight character in silver armor, sword raised overhead, facing right",
    "a knight character in silver armor, crouching with shield up, facing right",
    "a knight character in silver armor, jumping in the air, facing right",
    "a knight character in silver armor, fallen on ground, facing right",
]

pose_sprites = []
for i, pose_prompt in enumerate(pose_prompts):
    full_prompt = f"{pose_prompt}, {STYLE_SUFFIX}"
    generator = torch.Generator("cuda").manual_seed(BASE_SEED)

    result = img2img_pipe(
        prompt=full_prompt,
        negative_prompt=NEGATIVE_PROMPT,
        image=base_sprite,
        strength=0.55,
        num_inference_steps=30,
        guidance_scale=7.5,
        generator=generator,
    ).images[0]

    result.save(f"pose_{i}.png")
    pose_sprites.append(result)
    print(f"Pose {i}: {pose_prompt[:50]}...")

The strength parameter is critical. Too low (below 0.3) and the pose barely changes. Too high (above 0.75) and you lose the character’s visual identity. Start at 0.55 and adjust. For subtle changes like idle variations, drop to 0.35. For dramatic changes like a death animation, push to 0.65.

Remove Backgrounds and Assemble the Sprite Sheet

Game engines expect sprites on transparent backgrounds. The rembg library handles background removal reliably for pixel art and illustrated styles.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
from rembg import remove
from io import BytesIO

def remove_bg(image: Image.Image) -> Image.Image:
    """Remove background and return RGBA image."""
    img_bytes = BytesIO()
    image.save(img_bytes, format="PNG")
    img_bytes.seek(0)
    result_bytes = remove(img_bytes.read())
    return Image.open(BytesIO(result_bytes)).convert("RGBA")


def build_sprite_sheet(
    sprites: list[Image.Image],
    columns: int = 4,
    sprite_size: tuple[int, int] = (128, 128),
    padding: int = 2,
) -> Image.Image:
    """Pack sprites into a grid sprite sheet."""
    rows = (len(sprites) + columns - 1) // columns

    sheet_width = columns * (sprite_size[0] + padding) + padding
    sheet_height = rows * (sprite_size[1] + padding) + padding

    sheet = Image.new("RGBA", (sheet_width, sheet_height), (0, 0, 0, 0))

    for idx, sprite in enumerate(sprites):
        # Remove background
        clean_sprite = remove_bg(sprite)
        # Resize to target sprite size
        clean_sprite = clean_sprite.resize(sprite_size, Image.LANCZOS)

        col = idx % columns
        row = idx // columns
        x = padding + col * (sprite_size[0] + padding)
        y = padding + row * (sprite_size[1] + padding)

        sheet.paste(clean_sprite, (x, y), clean_sprite)

    return sheet


# Combine all sprites into one sheet
all_sprites = sprites + pose_sprites
sprite_sheet = build_sprite_sheet(all_sprites, columns=4, sprite_size=(128, 128))
sprite_sheet.save("knight_sprite_sheet.png")
print(f"Sprite sheet saved: {sprite_sheet.size[0]}x{sprite_sheet.size[1]}")

The output is a transparent PNG with all sprites packed into a 4-column grid. Most game engines (Unity, Godot, Phaser) can slice this automatically if you use consistent cell sizes.

Sizing considerations

64x64 – good for retro-style top-down games
128x128 – solid default for 2D platformers and RPGs
256x256 – use when you need detail, but file sizes grow fast
Padding of 2px between cells prevents texture bleeding in game engines

Tips for Consistent Art Style

Seed control and prompt templates only get you 80% of the way. Here’s what closes the gap:

Use a LoRA fine-tuned on your target art style. Even a small LoRA (4-8 rank) trained on 20-30 reference sprites dramatically improves consistency. Train it with kohya_ss or use a community pixel art LoRA from Civitai.
Generate at 512x512, downscale to target size. SD 1.5 produces sharper sprites at its native resolution. Downscaling also smooths out minor inconsistencies between frames.
Batch-generate and cherry-pick. Generate 4 images per pose with consecutive seeds and pick the best. It’s faster than re-rolling one at a time.
Post-process with a palette. Force all sprites to the same color palette in PIL. This alone makes inconsistent sprites look like they belong together:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
def quantize_to_palette(image: Image.Image, n_colors: int = 16) -> Image.Image:
    """Reduce an image to a fixed number of colors for consistent pixel art."""
    # Preserve alpha channel
    if image.mode == "RGBA":
        alpha = image.split()[3]
        rgb = image.convert("RGB")
        quantized = rgb.quantize(colors=n_colors, method=Image.MEDIANCUT)
        quantized = quantized.convert("RGB")
        result = quantized.convert("RGBA")
        result.putalpha(alpha)
        return result
    return image.quantize(colors=n_colors, method=Image.MEDIANCUT).convert("RGB")


# Apply to all sprites before building the sheet
palette_sprites = [quantize_to_palette(s, n_colors=16) for s in all_sprites]
final_sheet = build_sprite_sheet(palette_sprites, columns=4, sprite_size=(128, 128))
final_sheet.save("knight_sheet_palette.png")

Sixteen colors is the sweet spot for retro pixel art. Bump to 32 or 64 for higher-fidelity styles.

Common Errors and Fixes

RuntimeError: CUDA out of memory – SD 1.5 at 512x512 needs about 6GB VRAM. If you’re tight on memory, add pipe.enable_attention_slicing() or generate at 384x384 and upscale after.

Sprites have different proportions – This happens when your prompt doesn’t anchor the character’s scale. Always include “full body, centered” in the style suffix. If the character still drifts in size, crop and re-center each sprite in PIL before assembling the sheet.

Background removal leaves artifacts – rembg sometimes struggles with sprites that have colors similar to the background. Generate on a solid green or magenta background by adding “on solid green background” to your prompt, then use chroma key removal instead:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def chroma_key_remove(image: Image.Image, key_color: tuple = (0, 255, 0), threshold: int = 80) -> Image.Image:
    """Remove a specific background color with tolerance."""
    image = image.convert("RGBA")
    pixels = image.load()
    width, height = image.size

    for y in range(height):
        for x in range(width):
            r, g, b, a = pixels[x, y]
            distance = ((r - key_color[0]) ** 2 + (g - key_color[1]) ** 2 + (b - key_color[2]) ** 2) ** 0.5
            if distance < threshold:
                pixels[x, y] = (0, 0, 0, 0)

    return image

Style inconsistency between frames – If you’re getting wildly different styles even with seed control, your prompt may be too vague. Be more specific about the art style: instead of “pixel art”, write “16-bit SNES-style pixel art, 4-color shading, black outline”. The more constrained the style description, the less room the model has to drift.

rembg is slow – First run downloads a model (~170MB). After that, batch processing is the bottleneck. Pass session=new_session("u2net") to reuse the model session across calls instead of reloading it per image.

Generate Base Sprites with a Consistent Style#

Create Pose Variations with img2img#

Remove Backgrounds and Assemble the Sprite Sheet#

Sizing considerations#

Tips for Consistent Art Style#

Common Errors and Fixes#

Related Guides#

About the Author