What NeMo Guardrails Actually Does
NeMo Guardrails is NVIDIA’s open-source toolkit for adding programmable safety rails to LLM-based applications. Instead of writing a pile of regex filters and hoping for the best, you define rails – constraints on what goes in and what comes out – using a combination of YAML configuration and Colang, a Python-like domain-specific language for conversational flows.
The toolkit (v0.20.0, released January 2026) supports five rail types: input rails that screen user messages, output rails that validate model responses, dialog rails that control conversation flow, retrieval rails for RAG pipelines, and execution rails for custom action safety. It works with OpenAI, Anthropic, NVIDIA NIM, local models through Ollama, and anything compatible with the OpenAI API format.
You need Python 3.10+ and a C++ compiler (for the Annoy dependency). Let’s get it running.
Install and Set Up Your First Config
| |
If the install fails with a build error, you’re probably missing C++ build tools. On Ubuntu/Debian, run sudo apt install build-essential. On macOS, xcode-select --install handles it.
NeMo Guardrails expects a config directory with at least two files: config.yml for your model and rail settings, and a .co file for Colang flow definitions. Here’s a minimal setup.
Create a directory called config/ and add config.yml:
| |
Now add config/rails.co to define what happens when a rail triggers:
| |
Running Your Guardrailed App
With the config directory ready, the Python code is minimal:
| |
For async applications (FastAPI, etc.), use generate_async instead:
| |
Inline Configuration Without Files
If you don’t want a config directory – maybe you’re building a microservice or want everything in code – you can pass config content directly:
| |
This approach is useful for testing different configurations programmatically.
Adding Custom Actions
The real power shows up when you register custom Python functions as actions. Say you want to log every blocked request to a database, or call an external moderation API before processing:
| |
Then reference it in your Colang file:
| |
Topical Rails: Keeping the Conversation on Track
Beyond safety filtering, you’ll often want to restrict what topics your bot talks about. A customer support bot shouldn’t write poetry. A code assistant shouldn’t give medical advice.
Define allowed and blocked topics in your Colang file:
| |
NeMo Guardrails uses the example phrases to build an intent classifier (powered by embeddings), so you don’t need to list every possible variant – three to five examples per intent is enough for decent coverage.
Common Errors and How to Fix Them
“I’m sorry, an internal error has occurred.” This is the most common issue and it’s almost always caused by line breaks in the LLM response that break Colang parsing. It frequently appears with Azure-hosted GPT-4o and local models via Ollama. Fix it by adding singlecall mode to your model config:
| |
Or upgrade to v0.20.0+, which handles multi-line responses more gracefully.
“The output_vars option is not supported for Colang 2.0 configurations.” You’ll hit this when using LangChain integration with Colang 2.0 configs. Colang 2.0 changed how variables pass between rails. Either switch back to Colang 1.0 (the default) or remove output_vars from your LangChain RunnableRails wrapper.
rag() missing 1 required positional argument: 'query'. This means your custom action signature doesn’t match what NeMo expects. Custom actions receive a context dict – make sure your function signature accepts it:
| |
401/403 errors with NVIDIA NIM endpoints. Your API key is missing or expired. Set NVIDIA_API_KEY in your environment, or pass it explicitly in config.yml under parameters.api_key. Generate a new key from the NVIDIA NGC catalog if your current one doesn’t work.
Colang 2.0: What Changed
Colang 2.0 is a complete rewrite of the language and runtime, available since v0.9.0 and continuously improved. The key differences from 1.0:
- Smaller set of abstractions: just flows, events, and actions (no more
define bot/define usercanonical forms) - Python-like syntax: indentation-based, with
if/else/while/whenconstructs - Parallel flows: multiple flows can run concurrently, which is critical for checking multiple rails at once
- Standard Library (CSL): built-in flows for common patterns like greeting, content moderation, and topic control
To use Colang 2.0, add this to your config.yml:
| |
A Colang 2.0 flow looks like this:
| |
Stick with Colang 1.0 if you’re starting out. It’s simpler, has more documentation, and is still the default. Move to 2.0 when you need parallel flow execution or more granular event-driven control.
Performance Considerations
Each rail adds latency because it triggers an additional LLM call. With both input and output self-check rails enabled, every user message triggers three LLM calls total: one for input checking, one for the actual response, and one for output checking.
v0.20.0 introduced parallel rail execution for Colang 1.0 configs. Enable it to run multiple rails concurrently instead of sequentially:
| |
For production, consider using a smaller model (GPT-4o-mini, Claude Haiku, or a local model) for the safety check calls, while keeping a stronger model for the main response. You can configure this with multiple models:
| |
This keeps your safety latency under 200ms while the main model handles the heavy lifting.
When to Use NeMo Guardrails vs. Custom Filters
NeMo Guardrails is the right call when you need configurable, declarative safety rules that non-engineers can read and modify. The Colang files are close enough to plain English that a product manager can review them.
If you only need basic regex-based input/output screening, building your own filter (as covered in the content filtering guide) will be simpler and faster. NeMo adds value when you need dialog flow control, topical rails, multi-step conversation management, or when you want to swap safety rules without changing application code.
The two approaches aren’t mutually exclusive. Use custom regex filters for the fast, deterministic checks and NeMo Guardrails for the nuanced, LLM-powered safety layer on top.