The Deforum Approach to AI Animation

Deforum generates video by running Stable Diffusion frame-by-frame, applying geometric transformations (pan, zoom, rotate) between frames, and feeding each output back as the input for the next. The result is smooth, coherent animation that can shift between scenes based on prompt scheduling – different text prompts at different frame numbers.

Most tutorials show the Automatic1111 web UI extension. We’re skipping that entirely and building the pipeline in pure Python with diffusers, torch, PIL, numpy, and subprocess for FFmpeg. This gives you full programmatic control over every parameter.

1
pip install diffusers transformers accelerate torch torchvision pillow numpy

You also need FFmpeg installed on your system for the final video export:

1
2
3
4
5
# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

Set Up the Stable Diffusion Pipeline

We use StableDiffusionImg2ImgPipeline because Deforum-style animation is fundamentally img2img – each frame is generated by conditioning on the previous frame.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import torch
from diffusers import StableDiffusionImg2ImgPipeline

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    "stable-diffusion-v1-5/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    safety_checker=None,
)
pipe.enable_model_cpu_offload()

# Optional: compile the UNet for 20-30% speedup on PyTorch 2.x
# pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

CPU offloading keeps VRAM usage around 4-5GB. If you have a 24GB card, skip offloading and call pipe.to("cuda") for faster generation.

Define Animation Parameters and Keyframes

Deforum’s power comes from keyframe-driven animation. You define how the camera moves at specific frame numbers, and values get interpolated between them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np

# Animation settings
TOTAL_FRAMES = 120
WIDTH, HEIGHT = 512, 512
FPS = 15

# Keyframes: frame_number -> value
# Translation (pixels per frame)
translation_x_keys = {0: 0, 30: 2, 60: -1, 90: 3, 120: 0}
translation_y_keys = {0: 0, 30: -1, 60: 0, 90: -2, 120: 0}

# Rotation (degrees per frame)
rotation_keys = {0: 0, 30: 0.5, 60: -0.3, 90: 0.8, 120: 0}

# Zoom (1.0 = no zoom, >1 = zoom in, <1 = zoom out)
zoom_keys = {0: 1.0, 30: 1.02, 60: 1.0, 90: 1.01, 120: 1.0}

# Strength controls how much Stable Diffusion changes each frame
# Lower = more coherent (less change), higher = more creative (more change)
strength_keys = {0: 0.55, 60: 0.6, 120: 0.55}

def interpolate_keyframes(keys: dict, frame: int) -> float:
    """Linearly interpolate between keyframe values."""
    sorted_frames = sorted(keys.keys())

    if frame <= sorted_frames[0]:
        return keys[sorted_frames[0]]
    if frame >= sorted_frames[-1]:
        return keys[sorted_frames[-1]]

    for i in range(len(sorted_frames) - 1):
        f_start = sorted_frames[i]
        f_end = sorted_frames[i + 1]
        if f_start <= frame <= f_end:
            t = (frame - f_start) / (f_end - f_start)
            return keys[f_start] + t * (keys[f_end] - keys[f_start])

    return keys[sorted_frames[-1]]

The interpolation function handles smooth transitions between keyframes. You set values at specific frames, and everything between gets linearly blended. For more organic motion, swap linear interpolation for a cubic spline – scipy.interpolate.CubicSpline works well here.

Build Geometric Transformation Functions

Between each frame, we apply 2D transformations to simulate camera movement. This warped image becomes the init image for the next Stable Diffusion pass.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy as np
from PIL import Image

def apply_2d_transforms(
    image: Image.Image,
    translate_x: float,
    translate_y: float,
    rotation_deg: float,
    zoom: float,
) -> Image.Image:
    """Apply translation, rotation, and zoom to a PIL image."""
    img_array = np.array(image, dtype=np.float32)
    h, w = img_array.shape[:2]
    cx, cy = w / 2.0, h / 2.0

    # Build the affine transformation matrix
    # Step 1: translate center to origin
    # Step 2: apply zoom and rotation
    # Step 3: translate back and apply pan
    cos_a = np.cos(np.radians(rotation_deg))
    sin_a = np.sin(np.radians(rotation_deg))

    # Combined rotation + zoom matrix
    M = np.array([
        [zoom * cos_a, -zoom * sin_a, (1 - zoom * cos_a) * cx + zoom * sin_a * cy + translate_x],
        [zoom * sin_a,  zoom * cos_a, (1 - zoom * cos_a) * cy - zoom * sin_a * cx + translate_y],
    ], dtype=np.float32)

    import cv2
    warped = cv2.warpAffine(
        img_array,
        M,
        (w, h),
        flags=cv2.INTER_CUBIC,
        borderMode=cv2.BORDER_REFLECT_101,
    )
    return Image.fromarray(warped.astype(np.uint8))

BORDER_REFLECT_101 mirrors pixels at the edges instead of filling with black. This prevents the harsh black borders you’d get with zero padding, and keeps the image looking natural as the camera pans.

Set Up Prompt Scheduling

Prompt scheduling lets you change the text prompt at specific frame numbers. This is how you create scene transitions – morph from a forest into a city, or shift from day to night.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Prompt schedule: frame_number -> prompt
prompt_schedule = {
    0: "a mystical forest with glowing mushrooms, digital art, highly detailed, volumetric lighting",
    40: "an ancient temple overgrown with vines, digital art, highly detailed, volumetric lighting",
    80: "a futuristic neon city at night, cyberpunk, digital art, highly detailed, volumetric lighting",
}

negative_prompt = "blurry, low quality, watermark, text, deformed, disfigured, jpeg artifacts"

def get_prompt_for_frame(schedule: dict, frame: int) -> str:
    """Return the active prompt for a given frame number."""
    active_prompt = schedule[0]
    for f in sorted(schedule.keys()):
        if frame >= f:
            active_prompt = schedule[f]
        else:
            break
    return active_prompt

Keeping a consistent style suffix across all prompts (“digital art, highly detailed, volumetric lighting”) helps maintain visual coherence as the content changes. The negative prompt stays the same throughout – it prevents common artifacts across all frames.

Generate the Animation Frame by Frame

This is the core loop. Each iteration: get the current prompt, apply geometric transforms to the previous frame, run img2img, and save the result.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import os
import torch
from PIL import Image

output_dir = "frames"
os.makedirs(output_dir, exist_ok=True)

# Generate the first frame from text (using a blank init image with high strength)
generator = torch.Generator(device="cpu").manual_seed(42)
init_image = Image.new("RGB", (WIDTH, HEIGHT), color=(20, 20, 40))

first_frame = pipe(
    prompt=get_prompt_for_frame(prompt_schedule, 0),
    negative_prompt=negative_prompt,
    image=init_image,
    strength=0.99,  # Near-full generation for the first frame
    num_inference_steps=30,
    guidance_scale=7.5,
    generator=generator,
).images[0]

first_frame.save(os.path.join(output_dir, "frame_0000.png"))
prev_frame = first_frame
print("Frame 0000 saved")

# Generate subsequent frames
for frame_idx in range(1, TOTAL_FRAMES):
    # Get interpolated animation values
    tx = interpolate_keyframes(translation_x_keys, frame_idx)
    ty = interpolate_keyframes(translation_y_keys, frame_idx)
    rot = interpolate_keyframes(rotation_keys, frame_idx)
    zm = interpolate_keyframes(zoom_keys, frame_idx)
    strength = interpolate_keyframes(strength_keys, frame_idx)

    # Apply geometric transforms to previous frame
    warped = apply_2d_transforms(prev_frame, tx, ty, rot, zm)

    # Get the current prompt
    prompt = get_prompt_for_frame(prompt_schedule, frame_idx)

    # Run img2img on the warped frame
    generator = torch.Generator(device="cpu").manual_seed(42 + frame_idx)
    result = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        image=warped,
        strength=strength,
        num_inference_steps=20,
        guidance_scale=7.5,
        generator=generator,
    ).images[0]

    filename = f"frame_{frame_idx:04d}.png"
    result.save(os.path.join(output_dir, filename))
    prev_frame = result

    if (frame_idx + 1) % 10 == 0:
        print(f"Frame {frame_idx:04d}/{TOTAL_FRAMES} saved (prompt: {prompt[:40]}...)")

print(f"All {TOTAL_FRAMES} frames generated in {output_dir}/")

A few things to note about the strength parameter: at 0.55, you get strong temporal coherence – each frame looks like its predecessor with subtle changes. Push it to 0.7+ and Stable Diffusion takes more creative liberty, which makes prompt transitions more dramatic but can cause flickering. Start at 0.55 and adjust based on your results.

Each frame gets a unique seed derived from the frame index. This keeps generation deterministic and reproducible while giving each frame variation.

Smooth Scene Transitions with Prompt Blending

Hard prompt switches at keyframes can cause jarring visual jumps. To smooth them out, blend prompts during transition windows using the prompt_embeds approach.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import torch
from diffusers import StableDiffusionImg2ImgPipeline

def get_blended_prompt_embeds(
    pipe: StableDiffusionImg2ImgPipeline,
    prompt_a: str,
    prompt_b: str,
    blend_factor: float,
) -> torch.Tensor:
    """Blend two prompt embeddings. blend_factor=0 is all A, 1 is all B."""
    inputs_a = pipe.tokenizer(
        prompt_a, padding="max_length",
        max_length=pipe.tokenizer.model_max_length,
        truncation=True, return_tensors="pt",
    )
    inputs_b = pipe.tokenizer(
        prompt_b, padding="max_length",
        max_length=pipe.tokenizer.model_max_length,
        truncation=True, return_tensors="pt",
    )

    with torch.no_grad():
        embeds_a = pipe.text_encoder(inputs_a.input_ids.to(pipe.device))[0]
        embeds_b = pipe.text_encoder(inputs_b.input_ids.to(pipe.device))[0]

    blended = (1 - blend_factor) * embeds_a + blend_factor * embeds_b
    return blended


# Example: blend between prompts over a 20-frame window
BLEND_WINDOW = 20

def get_prompt_embeds_for_frame(pipe, schedule, frame):
    """Get prompt embeddings, blending during transition windows."""
    sorted_keyframes = sorted(schedule.keys())

    # Check if we're in a transition window
    for i in range(len(sorted_keyframes) - 1):
        transition_start = sorted_keyframes[i + 1] - BLEND_WINDOW
        transition_end = sorted_keyframes[i + 1]

        if transition_start <= frame < transition_end:
            t = (frame - transition_start) / BLEND_WINDOW
            prompt_a = schedule[sorted_keyframes[i]]
            prompt_b = schedule[sorted_keyframes[i + 1]]
            return get_blended_prompt_embeds(pipe, prompt_a, prompt_b, t)

    # Not in a transition -- use the active prompt directly
    active_prompt = get_prompt_for_frame(schedule, frame)
    inputs = pipe.tokenizer(
        active_prompt, padding="max_length",
        max_length=pipe.tokenizer.model_max_length,
        truncation=True, return_tensors="pt",
    )
    with torch.no_grad():
        embeds = pipe.text_encoder(inputs.input_ids.to(pipe.device))[0]
    return embeds

To use blended embeddings in the generation loop, pass prompt_embeds instead of prompt to the pipeline call:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
embeds = get_prompt_embeds_for_frame(pipe, prompt_schedule, frame_idx)
result = pipe(
    prompt_embeds=embeds,
    negative_prompt=negative_prompt,
    image=warped,
    strength=strength,
    num_inference_steps=20,
    guidance_scale=7.5,
    generator=generator,
).images[0]

This produces gradual morphs between scenes instead of abrupt cuts.

Custom Motion Paths

Beyond simple pans and zooms, you can define complex camera paths using parametric curves. Here’s a circular orbit that smoothly loops:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import math

def circular_motion_keys(total_frames: int, radius_x: float = 3.0,
                         radius_y: float = 2.0) -> tuple[dict, dict]:
    """Generate translation keyframes for a circular camera orbit."""
    tx_keys = {}
    ty_keys = {}
    for f in range(0, total_frames + 1, 10):  # Keyframe every 10 frames
        angle = 2 * math.pi * (f / total_frames)
        tx_keys[f] = radius_x * math.cos(angle)
        ty_keys[f] = radius_y * math.sin(angle)
    return tx_keys, ty_keys

# Spiraling zoom: zoom in while orbiting
def spiral_zoom_keys(total_frames: int, start_zoom: float = 1.0,
                     end_zoom: float = 1.5) -> dict:
    """Generate zoom keyframes for a smooth spiral."""
    keys = {}
    for f in range(0, total_frames + 1, 10):
        t = f / total_frames
        # Ease-in-out with a cosine curve
        eased = 0.5 * (1 - math.cos(math.pi * t))
        keys[f] = start_zoom + eased * (end_zoom - start_zoom)
    return keys

# Use these in place of the static keyframes
translation_x_keys, translation_y_keys = circular_motion_keys(TOTAL_FRAMES)
zoom_keys = spiral_zoom_keys(TOTAL_FRAMES)

The circular_motion_keys function creates a smooth elliptical orbit. Combine it with the spiraling zoom for a dramatic fly-in effect. You can chain these with rotation keyframes for even more dynamic camera work.

Export Frames to Video with FFmpeg

Once all frames are saved as PNGs, assemble them into an MP4 with FFmpeg via subprocess.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import subprocess

def frames_to_video(
    frame_dir: str,
    output_path: str,
    fps: int = 15,
    pattern: str = "frame_%04d.png",
    crf: int = 18,
) -> None:
    """Combine PNG frames into an H.264 MP4 video."""
    cmd = [
        "ffmpeg", "-y",
        "-framerate", str(fps),
        "-i", f"{frame_dir}/{pattern}",
        "-c:v", "libx264",
        "-crf", str(crf),
        "-pix_fmt", "yuv420p",
        "-preset", "slow",
        output_path,
    ]
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        print(f"FFmpeg error: {result.stderr}")
        raise RuntimeError("FFmpeg encoding failed")
    print(f"Video saved to {output_path}")

frames_to_video("frames", "motion_graphics.mp4", fps=FPS)

The -pix_fmt yuv420p flag ensures the video plays on all devices and browsers. Without it, some players show a green screen. CRF 18 gives near-lossless quality – bump it to 23 for smaller files if quality is acceptable.

For a looping GIF (useful for previews):

1
ffmpeg -y -framerate 15 -i frames/frame_%04d.png -vf "fps=15,scale=512:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" motion_graphics.gif

Common Errors and Fixes

Flickering between frames

This happens when strength is too high. Each frame gets too much creative freedom from Stable Diffusion, breaking temporal consistency. Lower strength to 0.45-0.55. Also make sure you pass the warped previous frame – not the original untransformed one – as the init image.

Black borders after panning

If you see black edges creeping in as the camera moves, your border mode is wrong. Use cv2.BORDER_REFLECT_101 instead of the default cv2.BORDER_CONSTANT. This mirrors edge pixels and keeps the image seamless.

1
2
3
4
5
warped = cv2.warpAffine(
    img_array, M, (w, h),
    flags=cv2.INTER_CUBIC,
    borderMode=cv2.BORDER_REFLECT_101,  # Not BORDER_CONSTANT
)

CUDA out of memory

The img2img pipeline with SD 1.5 at 512x512 needs about 4GB of VRAM. If you run out:

  1. Enable CPU offloading: pipe.enable_model_cpu_offload()
  2. Use torch.float16 (already set in our pipeline config)
  3. Reduce num_inference_steps to 15 – quality drop is minimal for img2img

FFmpeg says “No such file or directory”

The frame naming pattern must match exactly. If your frames are frame_0000.png through frame_0119.png, the pattern is frame_%04d.png. A mismatch gives a cryptic “No such file” error even though the frames exist.

1
2
3
# Verify your frame naming
ls frames/ | head -5
# Should output: frame_0000.png, frame_0001.png, ...

Prompt transitions cause visual “pops”

If switching prompts creates a jarring visual jump, either lower the strength during transition frames or use the prompt blending approach from the blending section above. A 15-20 frame blend window produces smooth morphs for most prompt pairs.

Generated video plays too fast or too slow

The FPS in the generation loop and the FPS in the FFmpeg export must match. If you generate at 15 FPS pacing but export at 30 FPS, the video plays at double speed. Keep FPS consistent across both steps, or adjust frame count proportionally.