Haystack 2.x is a complete rewrite. The old 1.x “node and pipeline” API is gone, replaced by a component-based architecture where you snap together typed building blocks – converters, splitters, embedders, retrievers, generators – and Haystack validates the connections at build time. If two components don’t have compatible input/output types, you’ll know before anything runs.

This matters because most search and RAG pipelines fail at the seams: a converter outputs the wrong format, a retriever expects embeddings that were never generated. Haystack’s type-checked pipeline graph eliminates that entire class of bugs.

Install Haystack

1
pip install haystack-ai

That’s the 2.x package. If you also want Qdrant as a vector store and OpenAI for embeddings/generation:

1
pip install qdrant-haystack openai

Set your OpenAI key:

1
export OPENAI_API_KEY="sk-your-key-here"

Core Concepts: Components and Pipelines

Everything in Haystack 2.x is a component – a Python class decorated with @component that declares typed inputs and outputs. A pipeline connects components by wiring outputs to inputs. You build two kinds of pipelines for search:

  • Indexing pipeline: takes raw files, converts them to documents, splits them into chunks, generates embeddings, and writes them to a document store.
  • Query pipeline: takes a user query, embeds it, retrieves relevant documents, and optionally generates an answer with an LLM.

Build an Indexing Pipeline

Start with the simplest useful pipeline: read text files, split them into chunks, embed with OpenAI, and store in memory.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from haystack import Pipeline, Document
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import OpenAIDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore

# Create the document store
document_store = InMemoryDocumentStore()

# Build the indexing pipeline
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("splitter", DocumentSplitter(
    split_by="sentence",
    split_length=3,
    split_overlap=1
))
indexing.add_component("embedder", OpenAIDocumentEmbedder(
    model="text-embedding-3-small"
))
indexing.add_component("writer", DocumentWriter(document_store=document_store))

# Connect the components
indexing.connect("converter.documents", "splitter.documents")
indexing.connect("splitter.documents", "embedder.documents")
indexing.connect("embedder.documents", "writer.documents")

# Run it on some files
indexing.run({"converter": {"sources": ["docs/guide.txt", "docs/faq.txt"]}})

The DocumentSplitter splits by sentence here, grouping 3 sentences per chunk with 1 sentence overlap. This gives your retriever enough context per chunk without blowing past token limits. Adjust split_length based on your content – for dense technical docs, 5 sentences works better.

Switch to Qdrant for Production

InMemoryDocumentStore is fine for prototyping but won’t survive a restart. For production, swap in Qdrant:

1
2
3
4
5
6
7
8
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore

document_store = QdrantDocumentStore(
    url="http://localhost:6333",
    index="my_documents",
    embedding_dim=1536,  # matches text-embedding-3-small
    recreate_index=False,
)

Everything else in your pipeline stays exactly the same. That’s the whole point of the component architecture – swap the store, keep the pipeline.

Build a Retrieval Pipeline

Now query those documents. The simplest retrieval pipeline embeds the query and does a vector search:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
from haystack.components.embedders import OpenAITextEmbedder
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", OpenAITextEmbedder(
    model="text-embedding-3-small"
))
query_pipeline.add_component("retriever", QdrantEmbeddingRetriever(
    document_store=document_store,
    top_k=10
))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

results = query_pipeline.run({
    "text_embedder": {"text": "How do I reset my password?"}
})

for doc in results["retriever"]["documents"]:
    print(f"[{doc.score:.3f}] {doc.content[:100]}")

Add a Reranker for Better Precision

Vector search gets you in the right neighborhood. A cross-encoder reranker tells you which house to knock on. Add one between the retriever and whatever consumes the results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from haystack.components.rankers import TransformersSimilarityRanker

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", OpenAITextEmbedder(
    model="text-embedding-3-small"
))
query_pipeline.add_component("retriever", QdrantEmbeddingRetriever(
    document_store=document_store,
    top_k=20  # retrieve more, let the reranker filter
))
query_pipeline.add_component("ranker", TransformersSimilarityRanker(
    model="cross-encoder/ms-marco-MiniLM-L-6-v2",
    top_k=5
))

query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "ranker.documents")

Retrieve 20 candidates, rerank down to 5. The cross-encoder is slower but dramatically more accurate than cosine similarity alone. For most use cases, this two-stage approach is the sweet spot.

Build a Full RAG Pipeline

Wire the retriever output into a prompt builder and then into an LLM generator to get generated answers grounded in your documents:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder

template = """
Answer the question based on the provided context. If the context doesn't
contain enough information, say so -- don't make things up.

Context:
{% for doc in documents %}
  {{ doc.content }}
{% endfor %}

Question: {{ query }}
Answer:
"""

rag_pipeline = Pipeline()
rag_pipeline.add_component("text_embedder", OpenAITextEmbedder(
    model="text-embedding-3-small"
))
rag_pipeline.add_component("retriever", QdrantEmbeddingRetriever(
    document_store=document_store,
    top_k=5
))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template=template))
rag_pipeline.add_component("generator", OpenAIGenerator(model="gpt-4o"))

rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "generator")

response = rag_pipeline.run({
    "text_embedder": {"text": "How do I reset my password?"},
    "prompt_builder": {"query": "How do I reset my password?"},
})

print(response["generator"]["replies"][0])

Notice that query goes to both text_embedder and prompt_builder. The embedder needs it to generate the query vector; the prompt builder needs it to construct the final prompt. Haystack lets you fan out inputs like this cleanly.

Build a Custom Component

Need something Haystack doesn’t ship? Write your own component. Here’s a simple document filter that drops chunks below a confidence threshold:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from haystack import component, Document
from typing import List

@component
class ConfidenceFilter:
    def __init__(self, threshold: float = 0.5):
        self.threshold = threshold

    @component.output_types(documents=List[Document])
    def run(self, documents: List[Document]) -> dict:
        filtered = [doc for doc in documents if (doc.score or 0) >= self.threshold]
        return {"documents": filtered}

Drop it into any pipeline between a retriever and a consumer:

1
2
3
rag_pipeline.add_component("filter", ConfidenceFilter(threshold=0.6))
rag_pipeline.connect("retriever.documents", "filter.documents")
rag_pipeline.connect("filter.documents", "prompt_builder.documents")

The @component decorator and @component.output_types declaration are all Haystack needs to validate your component’s connections at pipeline build time. If you forget the output type annotation, you’ll get a ComponentError when connecting.

Common Errors and Fixes

PipelineConnectError: The output type of X is not compatible with the input type of Y You’re wiring two components whose types don’t match. Check that your embedder outputs List[Document] (document embedder) vs List[float] (text embedder). The document embedder feeds into a writer; the text embedder feeds into a retriever.

ValueError: None of the documents have embeddings Your indexing pipeline didn’t run the embedder, or you wrote documents before embedding them. Make sure the connect() calls go converter -> splitter -> embedder -> writer in that exact order.

OpenAIError: Rate limit exceeded Embedding large document sets hits rate limits fast. Use OpenAIDocumentEmbedder(meta_fields_to_embed=None) to skip embedding metadata, and batch your indexing runs. Haystack batches internally, but you can control it with the batch_size parameter on the embedder.

QdrantException: Collection not found You set recreate_index=False but the collection doesn’t exist yet. Either set recreate_index=True on first run, or create the collection manually via the Qdrant API before starting the pipeline.

Custom component not connecting Make sure your run() method has the @component.output_types() decorator. Without it, Haystack can’t introspect the component’s outputs and connect() will raise a ComponentError.

PipelineMaxComponentRuns exceeded This happens with cyclic pipelines (e.g., a router that loops back). Increase the max_runs_per_component setting on the pipeline or add a proper exit condition to your routing logic.