How to Build AI Comic Strip Generation with Stable Diffusion

AI-generated comic strips sound fun until you realize every panel gives your hero a different face, hairstyle, and sometimes an extra finger. The fix is IP-Adapter – it injects a reference image’s identity into each generation so your character stays recognizable across panels. Here’s a minimal pipeline that generates multi-panel comic strips with SDXL, IP-Adapter, and PIL.

1
pip install diffusers transformers accelerate safetensors pillow

That gets you the core stack. You need a GPU with at least 10 GB VRAM for SDXL. If you’re on an A100 or 4090, you’re set. On a 3060 or similar, enable attention slicing (shown below).

Set Up the SDXL Pipeline with IP-Adapter

Load SDXL base and attach IP-Adapter in one shot. The h94/IP-Adapter repo hosts weights compatible with SDXL.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
)
pipe.to("cuda")

# Load IP-Adapter for character consistency
pipe.load_ip_adapter(
    "h94/IP-Adapter",
    subfolder="sdxl_models",
    weight_name="ip-adapter-plus_sdxl_vit-h.safetensors",
)

# Set IP-Adapter scale -- 0.6 balances identity preservation with prompt adherence
pipe.set_ip_adapter_scale(0.6)

# Save VRAM on smaller GPUs
pipe.enable_attention_slicing()

The ip-adapter-plus variant uses a stronger CLIP vision encoder (vit-h) which captures more facial detail than the base version. The scale parameter at 0.6 is a sweet spot – higher values lock onto the reference face but ignore your scene prompts, lower values drift too far from the character.

Generate a Reference Character

Before building the strip, you need one clean reference image of your character. Generate it with a detailed prompt and no IP-Adapter conditioning.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
from diffusers.utils import load_image

# Generate the reference character (no IP-Adapter input yet)
ref_prompt = (
    "portrait of a young woman with short blue hair and green eyes, "
    "comic book art style, cel shading, white background, "
    "sharp lines, high detail, upper body shot"
)

ref_image = pipe(
    prompt=ref_prompt,
    negative_prompt="blurry, photo-realistic, 3d render, deformed",
    num_inference_steps=30,
    guidance_scale=7.5,
    ip_adapter_image=None,
    height=1024,
    width=1024,
).images[0]

ref_image.save("reference_character.png")

This gives you a single consistent character sheet. Save it – you’ll feed it back into every panel generation as the IP-Adapter image.

Generate Comic Panels with Consistent Characters

Define your story as a list of scene prompts. Each panel gets the same reference image through IP-Adapter so the character stays on-model.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
scene_prompts = [
    "a young woman with short blue hair standing at a bus stop in the rain, comic book style, cel shading, speech bubble",
    "a young woman with short blue hair sitting on a bus looking out the window, comic book style, cel shading, dramatic lighting",
    "a young woman with short blue hair stepping off a bus into sunshine, comic book style, cel shading, vibrant colors",
    "a young woman with short blue hair walking into a coffee shop smiling, comic book style, cel shading, warm tones",
]

negative_prompt = "blurry, photo-realistic, 3d render, deformed, extra fingers, bad anatomy"

panels = []
for i, prompt in enumerate(scene_prompts):
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=30,
        guidance_scale=7.5,
        ip_adapter_image=ref_image,
        height=768,
        width=768,
    ).images[0]
    image.save(f"panel_{i}.png")
    panels.append(image)
    print(f"Panel {i} generated")

Each panel takes about 8-12 seconds on an A100. The character’s blue hair and green eyes should persist across all four frames because IP-Adapter is conditioning every generation on the same reference embedding.

Assemble Panels into a Comic Strip Grid

Now stitch the individual panels into a proper comic grid. PIL handles this without any extra dependencies.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
from PIL import Image, ImageDraw, ImageFont

def create_comic_grid(panels, cols=2, padding=20, border_width=4, bg_color=(30, 30, 30)):
    """Arrange panels into a grid with borders and padding."""
    if not panels:
        raise ValueError("No panels provided")

    panel_w, panel_h = panels[0].size
    rows = (len(panels) + cols - 1) // cols  # ceiling division

    grid_w = cols * panel_w + (cols + 1) * padding
    grid_h = rows * panel_h + (rows + 1) * padding

    grid = Image.new("RGB", (grid_w, grid_h), bg_color)
    draw = ImageDraw.Draw(grid)

    for idx, panel in enumerate(panels):
        row = idx // cols
        col = idx % cols
        x = padding + col * (panel_w + padding)
        y = padding + row * (panel_h + padding)

        # Draw white border around each panel
        border_rect = [
            x - border_width,
            y - border_width,
            x + panel_w + border_width,
            y + panel_h + border_width,
        ]
        draw.rectangle(border_rect, outline="white", width=border_width)

        grid.paste(panel, (x, y))

    return grid


comic = create_comic_grid(panels, cols=2, padding=30, border_width=3)
comic.save("comic_strip.png")
print(f"Comic strip saved: {comic.size[0]}x{comic.size[1]} pixels")

This produces a 2x2 grid with dark background and white panel borders – the classic comic layout. Change cols=4 for a horizontal strip, or cols=1 for a vertical webtoon format.

You can also add captions above or below each panel:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def add_captions(grid, captions, cols=2, panel_h=768, padding=30, font_size=28):
    """Overlay captions below each panel."""
    draw = ImageDraw.Draw(grid)
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", font_size)
    except OSError:
        font = ImageFont.load_default()

    panel_w = (grid.size[0] - (cols + 1) * padding) // cols

    for idx, caption in enumerate(captions):
        row = idx // cols
        col = idx % cols
        x = padding + col * (panel_w + padding) + panel_w // 2
        y = padding + row * (panel_h + padding) + panel_h - 40

        draw.text((x, y), caption, fill="white", font=font, anchor="mm")

    return grid


captions = ["Waiting...", "The long ride.", "Finally!", "Coffee time."]
comic = add_captions(comic, captions, cols=2, panel_h=768, padding=30)
comic.save("comic_strip_captioned.png")

Tuning Character Consistency

If your character drifts between panels, adjust these knobs:

IP-Adapter scale: Bump from 0.6 to 0.7 or 0.8. Above 0.85 the model starts ignoring your scene prompt entirely.
Prompt reinforcement: Repeat the character’s key features (“short blue hair, green eyes”) in every scene prompt. IP-Adapter handles the face, but hair color and accessories benefit from text reinforcement.
Seed locking: Fix the seed for more deterministic outputs, though this can make panels look too similar in composition.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Fix seed per panel for reproducibility
generator = torch.Generator(device="cuda").manual_seed(42)

image = pipe(
    prompt=scene_prompts[0],
    negative_prompt=negative_prompt,
    num_inference_steps=30,
    guidance_scale=7.5,
    ip_adapter_image=ref_image,
    generator=generator,
    height=768,
    width=768,
).images[0]

For multi-character strips, generate a separate reference image for each character and run the pipeline per-character, compositing them afterward with PIL or using regional prompting techniques.

Common Errors and Fixes

RuntimeError: CUDA out of memory – SDXL is heavy at 1024x1024. Drop panel resolution to 768x768, enable pipe.enable_attention_slicing(), or add pipe.enable_model_cpu_offload() which moves layers to CPU when not in use. This trades speed for VRAM.

ValueError: IP-Adapter image must be a PIL image – The ip_adapter_image parameter expects a PIL.Image.Image object. If you loaded from disk, make sure you used Image.open() and not just the file path string:

1
2
from PIL import Image
ref_image = Image.open("reference_character.png").convert("RGB")

Characters look nothing alike across panels – Your IP-Adapter scale is probably too low. Start at 0.6 and increment by 0.05 until faces stabilize. Also confirm you loaded ip-adapter-plus (not the base ip-adapter) since the plus variant has much stronger identity preservation.

OSError: h94/IP-Adapter does not appear to have a file named sdxl_models/ip-adapter-plus_sdxl_vit-h.safetensors – The model repo structure has changed a few times. Check the actual file listing on the Hugging Face repo page. You may need to update weight_name to match the current filename. Run huggingface-cli scan-cache to verify what’s downloaded locally.

Panels have inconsistent art styles – Add a style anchor to every prompt. Append the exact same suffix to each scene prompt, something like "comic book art style, cel shading, flat colors, black outlines". Without this, SDXL can drift between semi-realistic and cartoon styles depending on the scene description.

Grid layout has black gaps or misaligned panels – All panels must be the same resolution. If you mixed 1024x1024 and 768x768 panels, resize them before calling create_comic_grid:

1
2
target_size = (768, 768)
panels = [p.resize(target_size, Image.LANCZOS) for p in panels]

Set Up the SDXL Pipeline with IP-Adapter#

Generate a Reference Character#

Generate Comic Panels with Consistent Characters#

Assemble Panels into a Comic Strip Grid#

Tuning Character Consistency#

Common Errors and Fixes#

Related Guides#

About the Author