How to Build Interactive AI Demos with Gradio

Fastest Path to a Working Demo

Gradio lets you wrap any Python function in a web interface with a single decorator. Install it and have a running demo in under a minute:

1
pip install gradio

Here’s a text-in/text-out interface that works with any model or API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import gradio as gr

def summarize(text: str) -> str:
    # Replace with your actual model inference
    from transformers import pipeline
    summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
    result = summarizer(text, max_length=130, min_length=30)
    return result[0]["summary_text"]

demo = gr.Interface(
    fn=summarize,
    inputs=gr.Textbox(lines=8, placeholder="Paste text to summarize..."),
    outputs=gr.Textbox(label="Summary"),
    title="Text Summarizer",
    description="Paste any article and get a concise summary.",
)

demo.launch()

Run it and you get a local web server at http://localhost:7860 with a clean UI, input validation, and a shareable link if you pass share=True. That’s the core pattern: define a function, pick your input/output components, call launch().

Image Classification Interface

Gradio shines with visual models. Here’s an image classifier that accepts uploads or webcam captures:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import gradio as gr
from transformers import pipeline

classifier = pipeline("image-classification", model="google/vit-base-patch16-224")

def classify_image(image):
    results = classifier(image)
    return {r["label"]: round(r["score"], 4) for r in results}

demo = gr.Interface(
    fn=classify_image,
    inputs=gr.Image(type="pil"),
    outputs=gr.Label(num_top_classes=5),
    examples=["cat.jpg", "dog.jpg"],  # optional example images
)

demo.launch()

The gr.Image component handles file uploads, drag-and-drop, and clipboard paste automatically. Setting type="pil" gives you a PIL Image object directly – no manual decoding needed. The gr.Label output renders a nice bar chart of confidence scores.

Building Chatbots with ChatInterface

For conversational models, gr.ChatInterface handles message history, streaming display, and the chat UI layout:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import gradio as gr
from openai import OpenAI

client = OpenAI()

def chat(message: str, history: list[dict]) -> str:
    history.append({"role": "user", "content": message})
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=history,
    )
    return response.choices[0].message.content

demo = gr.ChatInterface(
    fn=chat,
    type="messages",
    title="AI Chat",
    examples=["Explain transformers in 3 sentences", "Write a Python fizzbuzz"],
)

demo.launch()

The type="messages" parameter uses the OpenAI-style message format ({"role": ..., "content": ...}). Gradio manages the conversation state and renders it as a chat bubble UI.

Streaming Responses

For token-by-token streaming, yield from a generator instead of returning a string:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import gradio as gr
from openai import OpenAI

client = OpenAI()

def chat_stream(message: str, history: list[dict]):
    history.append({"role": "user", "content": message})
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=history,
        stream=True,
    )
    partial = ""
    for chunk in stream:
        token = chunk.choices[0].delta.content or ""
        partial += token
        yield partial

demo = gr.ChatInterface(fn=chat_stream, type="messages")
demo.launch()

Gradio detects the generator and automatically switches to streaming mode. Tokens appear in the chat bubble as they arrive.

Complex Layouts with Blocks API

gr.Interface works for simple input-to-output demos, but real apps need tabs, sidebars, and multiple components interacting. The Blocks API gives you full layout control:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import gradio as gr

def process_file(file, mode):
    text = file.decode("utf-8") if isinstance(file, bytes) else open(file, "r").read()
    if mode == "word_count":
        return f"Words: {len(text.split())}"
    elif mode == "char_count":
        return f"Characters: {len(text)}"
    return text[:500]

with gr.Blocks(title="File Analyzer") as demo:
    gr.Markdown("## File Analyzer")

    with gr.Row():
        with gr.Column(scale=2):
            file_input = gr.File(label="Upload a text file", file_types=[".txt", ".csv", ".md"])
            mode = gr.Radio(
                choices=["word_count", "char_count", "preview"],
                value="word_count",
                label="Analysis Mode",
            )
            btn = gr.Button("Analyze", variant="primary")
        with gr.Column(scale=3):
            output = gr.Textbox(label="Result", lines=10)

    btn.click(fn=process_file, inputs=[file_input, mode], outputs=output)

demo.launch()

Use gr.Row() and gr.Column() for horizontal and vertical layouts. The scale parameter controls relative widths. Connect components with .click(), .change(), or .submit() event handlers.

Queuing for Heavy Models

Large models take seconds (or minutes) per request. Without queuing, concurrent users will crash your app or get timeouts. Enable it globally:

1
2
3
demo = gr.Interface(fn=slow_model_fn, inputs="text", outputs="text")
demo.queue(max_size=20)  # max 20 requests in queue
demo.launch()

Queuing is enabled by default in recent Gradio versions, but you should set max_size to prevent unbounded queues. Users see their position in the queue and estimated wait time automatically.

For the Blocks API, call .queue() on the Blocks object:

1
2
3
4
5
6
with gr.Blocks() as demo:
    # ... your layout ...
    pass

demo.queue(max_size=10, default_concurrency_limit=2)
demo.launch()

The default_concurrency_limit controls how many requests run simultaneously. Set this based on your GPU memory – running two inference calls in parallel on a single GPU will OOM most models.

Adding Authentication

Protect your demo with username/password authentication:

1
demo.launch(auth=("admin", "secretpassword"))

For multiple users:

1
2
3
4
5
def auth_fn(username, password):
    valid_users = {"alice": "pass123", "bob": "pass456"}
    return valid_users.get(username) == password

demo.launch(auth=auth_fn)

This adds a login page before the demo loads. It is basic HTTP auth – fine for internal demos, not for production user management.

Deploying to Hugging Face Spaces

The fastest way to share your demo publicly is Hugging Face Spaces. Create a new Space, choose the Gradio SDK, and push your code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Install the Hugging Face CLI
pip install huggingface_hub

# Login (needs a token from https://huggingface.co/settings/tokens)
huggingface-cli login

# Create and push a Space
huggingface-cli repo create my-demo --type space --space-sdk gradio
git clone https://huggingface.co/spaces/YOUR_USERNAME/my-demo
cd my-demo

# Add your app.py and requirements.txt, then push
git add . && git commit -m "initial demo" && git push

Your app.py is the entry point. Add a requirements.txt with your dependencies. The Space builds and runs automatically. For models that need a GPU, set the hardware in the Space settings (T4 is free for some accounts, A10G/A100 are paid).

You can also deploy directly from Python:

1
2
3
demo.launch()  # local
# OR
demo.launch(share=True)  # creates a temporary public URL (72 hours)

The share=True option creates a Gradio tunnel – useful for quick demos, but it expires. Use Hugging Face Spaces for permanent hosting.

Common Errors and Fixes

OSError: Cannot find empty port in range

Gradio defaults to port 7860. If it is in use, pass a different port: demo.launch(server_port=7861). Or kill the zombie process: lsof -ti:7860 | xargs kill -9.

ModuleNotFoundError when deploying to Spaces

Your requirements.txt is missing dependencies. List everything your app.py imports, including gradio itself. Pin versions to avoid surprises: gradio==5.12.0.

CUDA out of memory with concurrent users

Lower default_concurrency_limit in .queue() to 1. Or reduce batch size / input size. Gradio’s default concurrency will try to run multiple inferences in parallel, which doubles GPU memory usage.

ValueError: Output component count mismatch

Your function returns a different number of values than the number of output components. If you have outputs=[textbox1, textbox2], your function must return exactly two values: return result1, result2.

Streaming output shows all at once instead of token by token

Make sure your function uses yield (a generator), not return. Also verify you are not accidentally collecting all tokens into a list before yielding. Each yield should output the cumulative text so far, not just the new token.

File uploads fail silently

Check file size limits. By default Gradio caps uploads. Increase it with demo.launch(max_file_size="100mb"). Also verify your function signature matches the component – gr.File passes a file path string by default, not bytes.

Fastest Path to a Working Demo#

Image Classification Interface#

Building Chatbots with ChatInterface#

Streaming Responses#

Complex Layouts with Blocks API#

Queuing for Heavy Models#

Adding Authentication#

Deploying to Hugging Face Spaces#

Common Errors and Fixes#

Related Guides#

About the Author