When your agent silently picks the wrong tool, loops 47 times, then returns garbage – good luck figuring out why from print statements. LangSmith gives you a full trace of every LLM call, tool invocation, and routing decision your agent makes, displayed as a nested run tree you can actually read.
This guide covers the three things that matter most: instrumenting your agent code for tracing, reading traces to diagnose real failures, and building datasets that catch regressions before they hit production.
Set Up LangSmith Tracing
Install the SDK and set your environment variables. You need a LangSmith API key from smith.langchain.com.
| |
| |
LANGSMITH_PROJECT groups your traces. If you skip it, everything lands in a project called “default” and gets messy fast.
Instrument Your Agent with @traceable
The @traceable decorator logs function inputs, outputs, latency, and errors to LangSmith automatically. Every decorated function becomes a node in your trace tree.
| |
wrap_openai is the key piece here. It patches the OpenAI client so every chat.completions.create call gets traced with the full prompt, response tokens, model parameters, and latency. Without it, your trace tree shows the agent steps but not the actual LLM calls inside them.
Run agent_loop("what is 2+2") and check the LangSmith UI. You will see a nested trace: agent_loop > classify_intent > OpenAI call > run_tool.
Read Traces to Find Real Bugs
The trace tree in the LangSmith UI is where debugging actually happens. Each row is a “run” – an LLM call, tool invocation, or function execution. Here are the failure patterns you will see most often.
Infinite Agent Loops
Your agent calls a tool, gets a result, decides it needs to call the tool again, and repeats until it hits the recursion limit. In LangSmith, this shows up as a deeply nested trace with the same node name repeated dozens of times.
If you are using LangGraph, you will see this error:
| |
Fix it by setting an explicit recursion limit and adding a fallback:
| |
In the LangSmith trace, check what the LLM returned right before the loop started. Usually the model is generating a tool call when it should be generating a final answer. The fix is almost always a prompt change – explicitly tell the model when to stop calling tools.
Tool Call Failures
When a tool raises an exception, LangSmith marks that run in red and captures the full traceback. Common ones:
ValidationError: The LLM passed arguments that do not match the tool’s schema. Check the “Inputs” tab on the tool run – you will often see the model hallucinating parameter names.TypeError: missing required argument: The model skipped a required field. Your tool description needs to be more explicit about required vs. optional parameters.AuthenticationError/HTTPError 401: An API key is missing or expired. This shows up in traces as a failed child run under the tool node.
Slow Traces
Click any run in the trace tree to see its latency. If your agent takes 30 seconds but the LLM calls only account for 8 seconds, something in your tool execution is blocking. LangSmith’s timing breakdown makes this obvious – sort runs by duration and look for the outlier.
Build Datasets for Regression Testing
Traces are great for debugging one-off failures. Datasets let you catch regressions systematically. The workflow: create a dataset of input/expected-output pairs, run your agent against all of them, and score the results with evaluators.
| |
This creates an “experiment” in LangSmith that you can compare against future runs. Change the model, tweak the prompt, then run the same evaluation with experiment_prefix="intent-v2" – the UI shows a side-by-side diff of scores.
Turn Production Traces Into Test Cases
The best datasets come from real failures. In the LangSmith UI, filter traces by error status, find a representative failure, and click “Add to Dataset.” This turns the actual inputs that broke your agent into a permanent regression test. Over time, your dataset grows into a comprehensive test suite that reflects real-world usage patterns, not just the cases you thought to write by hand.
Evaluate Agent Trajectories
For complex agents, checking the final output is not enough. You need to verify the agent took the right steps. LangSmith supports trajectory evaluators that score the sequence of tool calls, not just the end result.
| |
This catches a sneaky class of bugs: agents that return the right answer but through a wasteful or unreliable path. If your agent calls a search tool three times when once would suffice, the final answer might still be correct, but the trajectory evaluator flags the inefficiency.
Automate It in CI
Run evaluations on every pull request to prevent regressions from shipping:
| |
Set a threshold on your evaluator scores. If intent accuracy drops below 90%, fail the build. This turns your LangSmith datasets into the same kind of safety net that unit tests provide for regular code.
Common Setup Mistakes
Traces not appearing in the UI: Check that LANGSMITH_TRACING is set to true (not "True" or 1). The SDK is strict about this string value. Also verify your API key starts with lsv2_pt_.
Nested traces showing as flat: If child runs appear as separate top-level traces instead of nested under a parent, you are probably creating new threads instead of passing context. Make sure your @traceable functions call each other directly – do not dispatch them through a thread pool or async queue without propagating the LangSmith run context.
“Project not found” errors: The project specified in LANGSMITH_PROJECT gets created automatically on the first trace. But if you typo the name and then look for traces in a different project, you will stare at an empty dashboard wondering why nothing shows up. Check the project dropdown in the UI.
Related Guides
- How to Build Agents with LangGraph
- How to Add Persistent Memory to AI Agents with LangChain
- How to Build a Data Analysis Agent with Code Execution
- How to Add Human-in-the-Loop Approval to AI Agents
- How to Build Autonomous Agents with the OpenAI Assistants API
- How to Build an MCP Server for AI Agents with Python
- How to Build a GitHub Issue Triage Agent with LLMs and the GitHub API
- How to Build a Retrieval Agent with Tool Calling and Reranking
- How to Build a Debugging Agent with Stack Trace Analysis
- How to Build a Multi-Agent Pipeline Using Anthropic’s Agent SDK and MCP