Haystack 2.x is a complete rewrite. The old 1.x “node and pipeline” API is gone, replaced by a component-based architecture where you snap together typed building blocks – converters, splitters, embedders, retrievers, generators – and Haystack validates the connections at build time. If two components don’t have compatible input/output types, you’ll know before anything runs.
This matters because most search and RAG pipelines fail at the seams: a converter outputs the wrong format, a retriever expects embeddings that were never generated. Haystack’s type-checked pipeline graph eliminates that entire class of bugs.
Install Haystack
| |
That’s the 2.x package. If you also want Qdrant as a vector store and OpenAI for embeddings/generation:
| |
Set your OpenAI key:
| |
Core Concepts: Components and Pipelines
Everything in Haystack 2.x is a component – a Python class decorated with @component that declares typed inputs and outputs. A pipeline connects components by wiring outputs to inputs. You build two kinds of pipelines for search:
- Indexing pipeline: takes raw files, converts them to documents, splits them into chunks, generates embeddings, and writes them to a document store.
- Query pipeline: takes a user query, embeds it, retrieves relevant documents, and optionally generates an answer with an LLM.
Build an Indexing Pipeline
Start with the simplest useful pipeline: read text files, split them into chunks, embed with OpenAI, and store in memory.
| |
The DocumentSplitter splits by sentence here, grouping 3 sentences per chunk with 1 sentence overlap. This gives your retriever enough context per chunk without blowing past token limits. Adjust split_length based on your content – for dense technical docs, 5 sentences works better.
Switch to Qdrant for Production
InMemoryDocumentStore is fine for prototyping but won’t survive a restart. For production, swap in Qdrant:
| |
Everything else in your pipeline stays exactly the same. That’s the whole point of the component architecture – swap the store, keep the pipeline.
Build a Retrieval Pipeline
Now query those documents. The simplest retrieval pipeline embeds the query and does a vector search:
| |
Add a Reranker for Better Precision
Vector search gets you in the right neighborhood. A cross-encoder reranker tells you which house to knock on. Add one between the retriever and whatever consumes the results:
| |
Retrieve 20 candidates, rerank down to 5. The cross-encoder is slower but dramatically more accurate than cosine similarity alone. For most use cases, this two-stage approach is the sweet spot.
Build a Full RAG Pipeline
Wire the retriever output into a prompt builder and then into an LLM generator to get generated answers grounded in your documents:
| |
Notice that query goes to both text_embedder and prompt_builder. The embedder needs it to generate the query vector; the prompt builder needs it to construct the final prompt. Haystack lets you fan out inputs like this cleanly.
Build a Custom Component
Need something Haystack doesn’t ship? Write your own component. Here’s a simple document filter that drops chunks below a confidence threshold:
| |
Drop it into any pipeline between a retriever and a consumer:
| |
The @component decorator and @component.output_types declaration are all Haystack needs to validate your component’s connections at pipeline build time. If you forget the output type annotation, you’ll get a ComponentError when connecting.
Common Errors and Fixes
PipelineConnectError: The output type of X is not compatible with the input type of Y
You’re wiring two components whose types don’t match. Check that your embedder outputs List[Document] (document embedder) vs List[float] (text embedder). The document embedder feeds into a writer; the text embedder feeds into a retriever.
ValueError: None of the documents have embeddings
Your indexing pipeline didn’t run the embedder, or you wrote documents before embedding them. Make sure the connect() calls go converter -> splitter -> embedder -> writer in that exact order.
OpenAIError: Rate limit exceeded
Embedding large document sets hits rate limits fast. Use OpenAIDocumentEmbedder(meta_fields_to_embed=None) to skip embedding metadata, and batch your indexing runs. Haystack batches internally, but you can control it with the batch_size parameter on the embedder.
QdrantException: Collection not found
You set recreate_index=False but the collection doesn’t exist yet. Either set recreate_index=True on first run, or create the collection manually via the Qdrant API before starting the pipeline.
Custom component not connecting
Make sure your run() method has the @component.output_types() decorator. Without it, Haystack can’t introspect the component’s outputs and connect() will raise a ComponentError.
PipelineMaxComponentRuns exceeded
This happens with cyclic pipelines (e.g., a router that loops back). Increase the max_runs_per_component setting on the pipeline or add a proper exit condition to your routing logic.
Related Guides
- How to Build AI Assistants with the Cohere API
- How to Use the Anthropic Extended Thinking API for Complex Reasoning
- How to Build AI Workflows with LangChain Expression Language
- How to Use the DeepSeek API for Code and Reasoning Tasks
- How to Use the Anthropic Message Batches API for Async Workloads
- How to Use the Anthropic Batch API for High-Volume Processing
- How to Use the Voyage AI API for Code and Text Embeddings
- How to Build AI Apps with the Vercel AI SDK and Next.js
- How to Use the Cohere Rerank API for Search Quality
- How to Use the OpenRouter API for Multi-Provider LLM Access