LLM apps fail in ways traditional software doesn’t. A model call that worked yesterday returns garbage today because the provider updated weights or your prompt drifted. You can’t monitor this with HTTP status codes alone – you need to see every input, output, token count, latency, and cost for every call your app makes. That’s what LangSmith does.
LangSmith is LangChain’s observability platform. It captures full traces of LLM interactions, lets you build evaluation datasets, and tracks cost and latency over time. It works with LangChain, but also with raw OpenAI calls and any other LLM provider through manual instrumentation.
Install and Configure
| |
You need two things: a LangSmith API key and some environment variables. Sign up at smith.langchain.com, create an API key under Settings, then export these:
| |
LANGCHAIN_TRACING_V2=true is the kill switch. Set it to false or remove it entirely to stop sending traces. LANGCHAIN_PROJECT groups traces by application – use different project names for staging vs. production so you don’t pollute your dashboards.
Auto-Tracing with LangChain
If you’re already using LangChain, tracing is automatic. The moment those environment variables are set, every chain invocation sends a trace to LangSmith with zero code changes.
| |
Run this and check your LangSmith dashboard. You’ll see a trace with the full input messages, the model’s response, token counts (prompt + completion), latency in milliseconds, and the estimated cost. Each trace is a tree – if you chain multiple calls together, LangSmith nests them so you can see parent-child relationships.
Manual Tracing with @traceable
You don’t need LangChain at all. The langsmith SDK gives you the @traceable decorator, which wraps any function and sends its inputs and outputs to LangSmith.
| |
The run_type parameter matters. Use "llm" for functions that directly call a model, "chain" for orchestration functions, "tool" for tool executions, and "retriever" for search/retrieval steps. LangSmith renders these differently in the trace view – LLM runs show token breakdowns while chain runs show nested call trees.
When you nest @traceable functions like this, LangSmith automatically builds a parent-child trace hierarchy. The qa-pipeline run appears as the root, with generate-answer nested underneath it.
Wrap OpenAI Directly
If you want full token-level tracking without @traceable on every function, use wrap_openai. This patches the OpenAI client to report traces automatically.
| |
Every call through the wrapped client now shows up in LangSmith with model name, token counts, and latency. This is the lowest-effort integration if you’re using the OpenAI SDK directly.
Trace an Agent Run
For agent workflows where an LLM calls tools in a loop, tracing becomes essential. Without it, debugging “why did the agent call the wrong tool on step 4?” is near impossible.
| |
In the LangSmith dashboard, this trace shows the entire agent loop: the initial LLM call, the tool invocation, and the final response generation. You can see exactly what the model decided to call and what arguments it passed.
Build Evaluation Datasets
Monitoring tells you something went wrong. Evaluations tell you what went wrong and whether it’s getting worse. LangSmith lets you create datasets of input-output examples and run evaluators against them.
| |
Now run your LLM against this dataset and score the results:
| |
LangSmith stores every evaluation run as an experiment. You can compare experiments side by side in the dashboard – this is how you catch regressions when you change a prompt or swap models.
Track Costs and Latency
LangSmith automatically computes cost estimates based on the model and token counts. In the dashboard, you can filter by project and time range to see:
- Total cost per day/week/month
- P50/P95/P99 latency for each trace type
- Token usage broken down by prompt vs. completion
- Error rate as a percentage of total runs
For programmatic access, use the SDK to query run stats:
| |
Set Up Rules and Alerts
LangSmith supports automated rules that trigger on trace conditions. You configure these in the dashboard under your project settings. Typical rules you should set up:
- Latency threshold: Alert when P95 latency exceeds your SLA (e.g., 5 seconds)
- Error spike: Alert when error rate exceeds 5% over a 15-minute window
- Cost anomaly: Alert when daily cost exceeds 2x the trailing 7-day average
- Sentiment/quality: Run an LLM-as-judge evaluator on a sample of production traces and alert on quality drops
You can also use online evaluators that run automatically on incoming traces. These attach scores to production runs in real time, which is useful for catching quality degradation without waiting for a manual evaluation pass.
Common Errors and Fixes
langsmith.utils.LangSmithAuthError: Invalid API key
| |
Your LANGCHAIN_API_KEY is wrong or expired. Go to smith.langchain.com, regenerate the key under Settings > API Keys, and re-export it. A common mistake is setting LANGSMITH_API_KEY instead of LANGCHAIN_API_KEY – the environment variable name uses the LANGCHAIN_ prefix.
Traces Not Appearing in Dashboard
You set everything up but the dashboard is empty. Check these in order:
- Is
LANGCHAIN_TRACING_V2set totrue(string, not boolean)? Print it:python -c "import os; print(os.environ.get('LANGCHAIN_TRACING_V2'))" - Are you looking at the right project? The default project is
"default"unless you setLANGCHAIN_PROJECT. - Is your firewall blocking outbound HTTPS to
api.smith.langchain.com? - Check for silent failures by enabling debug logging:
| |
langsmith.utils.LangSmithConnectionError: Connection refused
This usually means you’re pointing at a self-hosted LangSmith instance that’s down. If you’re using the hosted version, make sure LANGCHAIN_ENDPOINT is not set or is set to https://api.smith.langchain.com. Setting it to localhost or an internal URL will bypass the hosted service.
LangSmithRateLimitError: Too many requests
The free tier has rate limits on trace ingestion. If you’re sending thousands of traces per minute in production, you’ll hit this. Two options:
- Upgrade to a paid plan for higher limits
- Use sampling to trace only a percentage of requests:
| |
A cleaner approach is to use the LANGCHAIN_CALLBACKS_BACKGROUND environment variable set to true so tracing happens asynchronously and doesn’t block your request path.
High Latency from Tracing Overhead
Tracing adds network calls. If you’re seeing added latency, set LANGCHAIN_CALLBACKS_BACKGROUND=true to send traces in a background thread. This makes trace delivery best-effort but keeps your request latency clean.
| |
Production Recommendations
Use separate projects for each environment. Name them my-app-dev, my-app-staging, my-app-prod. This keeps your production dashboards clean and lets you set different alerting thresholds per environment.
Sample in production, trace everything in staging. Full tracing on every production request adds cost and latency. Trace 100% in staging for debugging, 5-20% in production for monitoring.
Build a golden dataset early. Collect 50-100 representative inputs with expected outputs before you ship. Run evaluations against this dataset on every prompt change. This catches regressions that unit tests can’t.
Tag traces with metadata. Use the metadata parameter in @traceable to attach version numbers, A/B test groups, or user segments. This makes filtering in the dashboard much more useful.
| |
Related Guides
- How to A/B Test LLM Prompts and Models in Production
- How to Load Test and Benchmark LLM APIs with Locust
- How to Route LLM Traffic by Cost and Complexity Using Intelligent Model Routing
- How to Autoscale LLM Inference on Kubernetes with KEDA
- How to Detect Model Drift and Data Drift in Production
- How to Build a Model Compression Pipeline with Pruning and Quantization
- How to Implement Canary Deployments for ML Models
- How to Build a Model Configuration Management Pipeline with Hydra
- How to Serve LLMs in Production with SGLang
- How to Build a Model Dependency Scanner and Vulnerability Checker