How to Build AI Logo Generation with Stable Diffusion and SDXL

Why SDXL for Logos

Most image generators default to photorealism. That’s the opposite of what you want for a logo. Logos need flat color, clean edges, simple geometry, and transparency-ready outputs. SDXL is the best open-source option here because its 1024x1024 native resolution and dual text encoder architecture handle typography and graphic design prompts far better than SD 1.5 or 2.1 ever did.

The trick is in the prompting. You fight SDXL’s photorealistic tendencies with aggressive negative prompts and style-specific keywords. Pair that with the SDXL refiner for cleanup and rembg for background removal, and you get results that are genuinely usable as starting points for brand design.

Install Dependencies

1
pip install diffusers transformers accelerate torch rembg pillow

You need a GPU with at least 12GB VRAM for SDXL in fp16. An RTX 3060 12GB works, but 16GB+ (RTX 4070 Ti, A4000) gives you room for the refiner pipeline too.

Load the SDXL Base Pipeline

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
pipe = pipe.to("cuda")

# Enable memory optimizations if you're on 12GB VRAM
pipe.enable_vae_slicing()
pipe.enable_vae_tiling()

First run downloads about 6.5GB of model weights. After that, it loads from Hugging Face’s local cache.

Craft Logo-Specific Prompts

Generic prompts produce generic images. Logo generation needs very deliberate keyword choices. Here’s the pattern that works:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
prompt = (
    "a minimalist logo for a tech startup called 'Nexus', "
    "flat vector design, geometric shapes, clean lines, "
    "single color on white background, "
    "professional brand mark, modern, simple, "
    "no gradients, no shadows, no text, icon only"
)

negative_prompt = (
    "photorealistic, 3d render, photograph, realistic texture, "
    "gradient, shadow, noise, grain, complex, cluttered, "
    "multiple colors, blurry, low quality, watermark, "
    "text, letters, words, typography, writing"
)

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=40,
    guidance_scale=12.0,
    width=1024,
    height=1024,
    generator=torch.Generator("cuda").manual_seed(42),
).images[0]

image.save("nexus_logo.png")

A few things to notice. The guidance_scale is set to 12.0, which is higher than the typical 7.5. This forces the model to stick closer to the prompt – important when you’re fighting its natural tendency toward photorealism. The negative prompt is aggressive about excluding realism cues.

Prompt Keywords That Work for Logos

Style	Keywords
Flat / vector	`flat vector design, minimalist, clean lines, solid colors`
Geometric	`geometric shapes, abstract, symmetrical, angular`
Emblem / badge	`emblem, badge, circular frame, crest, vintage seal`
Wordmark	`typographic logo, wordmark, custom lettering, sans-serif`
Mascot	`mascot logo, character design, cartoon style, friendly`

Drop no text from the negative prompt if you actually want lettering, but be warned – SDXL’s text rendering is hit-or-miss. You’re better off generating the icon and adding text in Figma or Illustrator.

Use the SDXL Refiner for Cleaner Results

The SDXL refiner is a second model that takes the base output and cleans up fine details. For logos, this sharpens edges and reduces noise artifacts that would look terrible when scaled down to a favicon.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import torch
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline

# Load base pipeline
base = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
base = base.to("cuda")

# Load refiner pipeline
refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
refiner = refiner.to("cuda")

prompt = (
    "a minimalist owl logo, flat vector design, "
    "geometric, clean lines, single color on white background, "
    "professional brand mark, modern, simple"
)

negative_prompt = (
    "photorealistic, 3d render, photograph, gradient, shadow, "
    "noise, grain, complex, cluttered, blurry, low quality"
)

# Step 1: Generate with the base model
# Use high_noise_frac to hand off to the refiner early
base_image = base(
    prompt=prompt,
    negative_prompt=negative_prompt,
    num_inference_steps=40,
    guidance_scale=12.0,
    width=1024,
    height=1024,
    denoising_end=0.8,
    output_type="latent",
    generator=torch.Generator("cuda").manual_seed(77),
).images

# Step 2: Refine the output
refined_image = refiner(
    prompt=prompt,
    negative_prompt=negative_prompt,
    image=base_image,
    num_inference_steps=40,
    denoising_start=0.8,
    generator=torch.Generator("cuda").manual_seed(77),
).images[0]

refined_image.save("owl_logo_refined.png")

The key parameter here is denoising_end=0.8 on the base and denoising_start=0.8 on the refiner. The base handles 80% of the denoising (composition and structure), then the refiner takes over for the final 20% (detail cleanup). For logos, this split works well because you want the base to nail the overall shape and the refiner to sharpen edges.

If you’re tight on VRAM, you can offload the base model before loading the refiner:

1
2
3
base = base.to("cpu")
torch.cuda.empty_cache()
refiner = refiner.to("cuda")

Batch Generation and Selection

Logo generation is a numbers game. You won’t nail the design on the first try. Generate a grid of candidates with different seeds and pick the best one.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import torch
from diffusers import StableDiffusionXLPipeline
from PIL import Image

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
pipe = pipe.to("cuda")

prompt = (
    "a minimalist mountain logo, flat vector design, "
    "geometric, clean lines, emerald green on white background, "
    "professional brand mark, outdoor adventure brand"
)

negative_prompt = (
    "photorealistic, 3d render, photograph, gradient, shadow, "
    "noise, grain, complex, cluttered, blurry, low quality, text"
)

# Generate 8 variations with different seeds
images = []
for seed in range(100, 108):
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        num_inference_steps=35,
        guidance_scale=12.0,
        width=1024,
        height=1024,
        generator=torch.Generator("cuda").manual_seed(seed),
    ).images[0]
    images.append(image)
    image.save(f"mountain_logo_seed_{seed}.png")

# Create a 4x2 contact sheet for comparison
grid_width = 4
grid_height = 2
cell_size = 512
grid = Image.new("RGB", (grid_width * cell_size, grid_height * cell_size))

for idx, img in enumerate(images):
    row = idx // grid_width
    col = idx % grid_width
    resized = img.resize((cell_size, cell_size), Image.LANCZOS)
    grid.paste(resized, (col * cell_size, row * cell_size))

grid.save("mountain_logo_grid.png")
print(f"Saved {len(images)} variations and a contact sheet")

Review the grid, pick the best seed, then re-run that seed at full resolution with the refiner pipeline.

Post-Processing: Remove the Background

Logos need transparent backgrounds. The rembg library handles this in two lines.

1
2
3
4
5
6
7
from PIL import Image
from rembg import remove

input_image = Image.open("owl_logo_refined.png")
output_image = remove(input_image)
output_image.save("owl_logo_transparent.png")
print("Saved logo with transparent background")

rembg uses the U2-Net model under the hood. It works surprisingly well on logo-style images with clean backgrounds, and it preserves sharp edges better than most alternatives.

For production use, you probably want to clean up the alpha channel too:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from PIL import Image, ImageFilter
from rembg import remove

input_image = Image.open("owl_logo_refined.png")
output_image = remove(input_image)

# Sharpen the alpha channel to remove semi-transparent fringing
alpha = output_image.split()[-1]
alpha = alpha.point(lambda x: 0 if x < 128 else 255)
output_image.putalpha(alpha)

output_image.save("owl_logo_clean.png")

Tips for Better Logo Results

Increase guidance scale. For logos, 10-15 works better than the default 7.5. You want the model to follow your prompt strictly, not improvise.

Keep prompts short and specific. Long rambling prompts confuse the model. Stick to 15-25 keywords focused on style, not content. “Minimalist geometric fox logo, flat vector” beats a paragraph.

Iterate on seeds, not prompts. Once you have a prompt that produces the right style, generate 20+ seeds before rewriting the prompt. The variance between seeds is often bigger than the variance between similar prompts.

Use square aspect ratios. 1024x1024 is ideal for logos. Non-square outputs tend to produce off-center compositions.

Skip text generation. SDXL can render some text, but it’s unreliable. Generate the icon/symbol and add typography separately in a design tool.

Common Errors and Fixes

torch.cuda.OutOfMemoryError: CUDA out of memory

SDXL needs about 10GB VRAM in fp16. If you’re running both base and refiner, you need to offload one before loading the other. Add pipe.enable_model_cpu_offload() instead of .to("cuda") to automatically move layers between CPU and GPU:

1
2
3
4
5
6
7
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True,
)
pipe.enable_model_cpu_offload()  # instead of pipe.to("cuda")

ValueError: Non-diffusers models are not supported

You’re probably trying to load a Civitai or .safetensors checkpoint directly. Use from_single_file() instead of from_pretrained():

1
2
3
4
pipe = StableDiffusionXLPipeline.from_single_file(
    "path/to/model.safetensors",
    torch_dtype=torch.float16,
)

Generated logos have noisy, grainy textures

Bump up num_inference_steps to 50 and make sure your negative prompt includes noise, grain, texture, rough. Also try adding smooth, clean, crisp to the positive prompt.

Logos look photorealistic instead of flat/vector

Your positive prompt isn’t strong enough on style keywords. Front-load the style: start with flat vector logo design before describing the subject. Double-check that photorealistic, 3d render, photograph are in the negative prompt.

rembg produces jagged edges around the logo

The default U2-Net model works well for most cases. If edges are rough, try the isnet-general-use model:

1
2
3
4
from rembg import remove, new_session

session = new_session("isnet-general-use")
output_image = remove(input_image, session=session)

Why SDXL for Logos#

Install Dependencies#

Load the SDXL Base Pipeline#

Craft Logo-Specific Prompts#

Prompt Keywords That Work for Logos#

Use the SDXL Refiner for Cleaner Results#

Batch Generation and Selection#

Post-Processing: Remove the Background#

Tips for Better Logo Results#

Common Errors and Fixes#

Related Guides#

About the Author