Fastest Path to a Working Demo
Gradio lets you wrap any Python function in a web interface with a single decorator. Install it and have a running demo in under a minute:
| |
Here’s a text-in/text-out interface that works with any model or API:
| |
Run it and you get a local web server at http://localhost:7860 with a clean UI, input validation, and a shareable link if you pass share=True. That’s the core pattern: define a function, pick your input/output components, call launch().
Image Classification Interface
Gradio shines with visual models. Here’s an image classifier that accepts uploads or webcam captures:
| |
The gr.Image component handles file uploads, drag-and-drop, and clipboard paste automatically. Setting type="pil" gives you a PIL Image object directly – no manual decoding needed. The gr.Label output renders a nice bar chart of confidence scores.
Building Chatbots with ChatInterface
For conversational models, gr.ChatInterface handles message history, streaming display, and the chat UI layout:
| |
The type="messages" parameter uses the OpenAI-style message format ({"role": ..., "content": ...}). Gradio manages the conversation state and renders it as a chat bubble UI.
Streaming Responses
For token-by-token streaming, yield from a generator instead of returning a string:
| |
Gradio detects the generator and automatically switches to streaming mode. Tokens appear in the chat bubble as they arrive.
Complex Layouts with Blocks API
gr.Interface works for simple input-to-output demos, but real apps need tabs, sidebars, and multiple components interacting. The Blocks API gives you full layout control:
| |
Use gr.Row() and gr.Column() for horizontal and vertical layouts. The scale parameter controls relative widths. Connect components with .click(), .change(), or .submit() event handlers.
Queuing for Heavy Models
Large models take seconds (or minutes) per request. Without queuing, concurrent users will crash your app or get timeouts. Enable it globally:
| |
Queuing is enabled by default in recent Gradio versions, but you should set max_size to prevent unbounded queues. Users see their position in the queue and estimated wait time automatically.
For the Blocks API, call .queue() on the Blocks object:
| |
The default_concurrency_limit controls how many requests run simultaneously. Set this based on your GPU memory – running two inference calls in parallel on a single GPU will OOM most models.
Adding Authentication
Protect your demo with username/password authentication:
| |
For multiple users:
| |
This adds a login page before the demo loads. It is basic HTTP auth – fine for internal demos, not for production user management.
Deploying to Hugging Face Spaces
The fastest way to share your demo publicly is Hugging Face Spaces. Create a new Space, choose the Gradio SDK, and push your code:
| |
Your app.py is the entry point. Add a requirements.txt with your dependencies. The Space builds and runs automatically. For models that need a GPU, set the hardware in the Space settings (T4 is free for some accounts, A10G/A100 are paid).
You can also deploy directly from Python:
| |
The share=True option creates a Gradio tunnel – useful for quick demos, but it expires. Use Hugging Face Spaces for permanent hosting.
Common Errors and Fixes
OSError: Cannot find empty port in range
Gradio defaults to port 7860. If it is in use, pass a different port: demo.launch(server_port=7861). Or kill the zombie process: lsof -ti:7860 | xargs kill -9.
ModuleNotFoundError when deploying to Spaces
Your requirements.txt is missing dependencies. List everything your app.py imports, including gradio itself. Pin versions to avoid surprises: gradio==5.12.0.
CUDA out of memory with concurrent users
Lower default_concurrency_limit in .queue() to 1. Or reduce batch size / input size. Gradio’s default concurrency will try to run multiple inferences in parallel, which doubles GPU memory usage.
ValueError: Output component count mismatch
Your function returns a different number of values than the number of output components. If you have outputs=[textbox1, textbox2], your function must return exactly two values: return result1, result2.
Streaming output shows all at once instead of token by token
Make sure your function uses yield (a generator), not return. Also verify you are not accidentally collecting all tokens into a list before yielding. Each yield should output the cumulative text so far, not just the new token.
File uploads fail silently
Check file size limits. By default Gradio caps uploads. Increase it with demo.launch(max_file_size="100mb"). Also verify your function signature matches the component – gr.File passes a file path string by default, not bytes.
Related Guides
- How to Track ML Experiments with Weights and Biases
- How to Build AI Workflows with LangChain Expression Language
- How to Use the Anthropic Prompt Caching API with Context Blocks
- How to Use the Anthropic Tool Use API for Agentic Workflows
- How to Build AI Apps with the Vercel AI SDK and Next.js
- How to Use the Stability AI API for Image and Video Generation
- How to Run Open-Source Models with the Replicate API
- How to Use the AWS Bedrock Converse API for Multi-Model Chat
- How to Use the OpenAI Realtime API for Voice Applications
- How to Use the Cerebras API for Fast LLM Inference