LCEL (LangChain Expression Language) is LangChain’s way of composing chains declaratively using a pipe operator. Think of it as Unix pipes for LLM calls. You wire together prompts, models, and output parsers into a single chain that supports streaming, batching, and async out of the box.

Here’s the fastest way to get started:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_template("Explain {topic} in three sentences.")
model = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()

chain = prompt | model | parser

result = chain.invoke({"topic": "gradient descent"})
print(result)

That | operator is the whole idea. Each component receives the output of the previous one. The prompt formats your input into messages, the model generates a response, and the parser pulls out the string. No wrapper classes, no inheritance hierarchies.

Why LCEL Over Legacy Chains

LangChain’s old LLMChain and SequentialChain classes still work, but they’re officially in maintenance mode. LCEL replaces them with something better in every way: automatic streaming support, native async, built-in retries, and a consistent interface across all components.

My recommendation: use LCEL for everything new. The old chain classes add abstraction without adding value.

The Pipe Operator in Detail

Every LCEL component implements the Runnable interface, which gives you .invoke(), .stream(), .batch(), and their async counterparts. When you pipe two runnables together with |, you get a RunnableSequence that chains their execution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

def word_count(text: str) -> dict:
    return {"text": text, "word_count": len(text.split())}

chain = (
    ChatPromptTemplate.from_template("Write a haiku about {subject}")
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
    | RunnableLambda(word_count)
)

result = chain.invoke({"subject": "debugging"})
print(result)
# {'text': 'Bugs hide in the code\nPrintf reveals their secrets\nTests finally pass', 'word_count': 12}

RunnableLambda wraps any function into a runnable. This is how you inject custom logic – validation, transformation, logging – anywhere in the chain.

Parallel Branches with RunnableParallel

Sometimes you need to run multiple operations on the same input simultaneously. RunnableParallel handles this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from langchain_core.runnables import RunnableParallel

summary_chain = (
    ChatPromptTemplate.from_template("Summarize this in one line: {text}")
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

sentiment_chain = (
    ChatPromptTemplate.from_template("What is the sentiment of this text? Reply with one word: {text}")
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

analysis = RunnableParallel(
    summary=summary_chain,
    sentiment=sentiment_chain,
)

result = analysis.invoke({"text": "The product works great but shipping took forever."})
print(result)
# {'summary': 'Positive product review with a complaint about slow shipping.',
#  'sentiment': 'Mixed'}

Both chains run concurrently. The output is a dictionary keyed by the names you assign. This is genuinely useful for fan-out patterns where you need multiple LLM perspectives on the same input.

RunnablePassthrough for Input Forwarding

A common pattern: you want to pass the original input alongside a computed value. RunnablePassthrough forwards whatever it receives unchanged.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from langchain_core.runnables import RunnablePassthrough, RunnableParallel

retriever = some_vector_store.as_retriever()

chain = (
    RunnableParallel(
        context=retriever,
        question=RunnablePassthrough(),
    )
    | ChatPromptTemplate.from_template(
        "Answer the question based on context.\n\nContext: {context}\n\nQuestion: {question}"
    )
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

answer = chain.invoke("What is LCEL?")

This is the standard RAG pattern in LCEL. The retriever fetches documents while RunnablePassthrough forwards the original question. Both feed into the prompt template as separate variables.

You can also use RunnablePassthrough.assign() to add new keys to an existing dictionary without losing the original data:

1
2
3
4
5
6
chain = RunnablePassthrough.assign(
    word_count=lambda x: len(x["text"].split())
)

result = chain.invoke({"text": "hello world foo bar"})
# {'text': 'hello world foo bar', 'word_count': 4}

Streaming Output

LCEL chains stream by default. You don’t need to configure anything special – just call .stream() instead of .invoke():

1
2
3
4
5
6
7
8
chain = (
    ChatPromptTemplate.from_template("Write a short story about {topic}")
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

for chunk in chain.stream({"topic": "a robot learning to cook"}):
    print(chunk, end="", flush=True)

Each token arrives as it’s generated. The StrOutputParser passes chunks through transparently. This works because LCEL components implement a streaming protocol – intermediate steps that can’t stream (like a retriever) will emit their full output as a single chunk, while streamable components (like chat models) emit token by token.

For async streaming in a web server:

1
2
async for chunk in chain.astream({"topic": "a robot learning to cook"}):
    yield chunk  # send to client via SSE or WebSocket

Error Handling in Chains

Chains fail. Models time out, rate limits hit, output parsing breaks. LCEL gives you .with_retry() and .with_fallbacks() to handle this without try/except spaghetti:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
from langchain_openai import ChatOpenAI

main_model = ChatOpenAI(model="gpt-4o").with_retry(
    stop_after_attempt=3,
    wait_exponential_jitter=True,
)

fallback_model = ChatOpenAI(model="gpt-4o-mini")

resilient_model = main_model.with_fallbacks([fallback_model])

chain = (
    ChatPromptTemplate.from_template("Explain {concept}")
    | resilient_model
    | StrOutputParser()
)

This tries GPT-4o up to three times with exponential backoff. If all retries fail, it falls back to GPT-4o-mini. Attach .with_retry() to the specific component that’s flaky, not the whole chain.

Common Errors

TypeError: Expected a Runnable, callable or dict – You tried to pipe something that isn’t a Runnable. Wrap plain functions with RunnableLambda() before using them in a chain.

KeyError in prompt template – Your chain passes a dictionary that’s missing a variable the prompt expects. Check that upstream components output all required keys. Use RunnablePassthrough.assign() to add missing keys.

OutputParserException – The model returned text that doesn’t match your parser’s expected format. This happens constantly with JsonOutputParser. Either add format instructions to your prompt or use PydanticOutputParser with a retry parser wrapping it.

Streaming returns one big chunk – If you placed a non-streaming component (like RunnableLambda doing a database call) after the model, it blocks the stream. Reorder your chain so streaming-compatible components come last.

NotImplementedError: astream not implemented – You’re calling .astream() on a custom runnable that only implements sync methods. Override _atransform or _ainvoke in your custom class.

When LCEL Isn’t the Right Call

LCEL works best for linear or fan-out workflows. If you need loops, conditional branching, or human-in-the-loop steps, look at LangGraph instead. LCEL chains are DAGs – they execute once, start to finish. The moment you need a cycle or state machine, you’ve outgrown LCEL.

For simple one-shot LLM calls, LCEL is also overkill. Just call the model directly. Don’t wrap a single API call in three pipes because it looks clean.