NLP and Text

How to Build a Hybrid RAG Pipeline with Qwen3 Embeddings and Qdrant in 2026

Combine BM25 keyword search with Qwen3-Embedding-8B dense vectors in Qdrant to build a retrieval pipeline that beats either approach alone—with working Python code.

How to Build a Coreference Resolution Pipeline with spaCy

Resolve pronouns and noun phrases to their referents using spaCy coreferee and transformer-based models for cleaner NLP output.

How to Build a Document Chunking Strategy Comparison Pipeline

Find the best chunking strategy for your RAG pipeline by benchmarking four methods side by side.

How to Build a Hybrid Keyword and Semantic Search Pipeline

Combine BM25 and vector search with Reciprocal Rank Fusion to get better results than either approach alone.

How to Build a Keyphrase Generation Pipeline with KeyphraseVectorizers

Extract grammatically correct keyphrases from documents using POS-pattern vectorizers instead of fixed n-gram windows

How to Build a Language Detection and Translation Pipeline

Detect any language and translate it to English or other targets using lingua and MarianMT in a single FastAPI service.

How to Build a Legal NER Pipeline with Transformers and spaCy

Extract legal entities like case citations, statutes, and parties from text using Transformers and spaCy

How to Build a Multilingual Sentiment Pipeline with XLM-RoBERTa

Analyze sentiment in any language with a single model using XLM-RoBERTa and Hugging Face Transformers

How to Build a Named Entity Linking Pipeline with Wikipedia and Transformers

Link named entities in text to Wikipedia articles using spaCy for NER and cross-encoder models for disambiguation

How to Build a Relation Extraction Pipeline

Extract relationships like works_at and founded_by from text using spaCy, transformers, and LLMs to build knowledge graphs.

How to Build a Resume Parser with spaCy and Transformers

Extract structured data from resumes using spaCy for entity recognition and Transformers for section classification.

How to Build a Sentiment-Aware Search Pipeline with Embeddings

Create search that understands both meaning and mood by combining sentence embeddings with sentiment analysis

How to Build a Spell Checking and Autocorrect Pipeline with Python

Create a fast spell checker that handles typos, misspellings, and domain-specific terms with Python libraries.

How to Build a Text Anonymization Pipeline with Presidio and spaCy

Detect and anonymize names, emails, phone numbers, and custom PII patterns in text with Presidio

How to Build a Text Chunking and Splitting Pipeline for RAG

Pick the right chunking strategy for your RAG app and stop losing retrieval quality to bad splits

How to Build a Text Classification Pipeline with SetFit

Build accurate text classifiers with minimal labeled data using SetFit’s few-shot learning approach

How to Build a Text Clustering Pipeline with Embeddings and HDBSCAN

Cluster text documents into meaningful groups without labeled data using embeddings, UMAP, and HDBSCAN in Python

How to Build a Text Correction and Grammar Checking Pipeline

Create automated text correction systems using rule-based tools, neural models, and LLMs with practical Python examples and performance benchmarks.

How to Build a Text Deduplication Pipeline with MinHash and LSH

Find and remove near-duplicate texts at scale using MinHash fingerprints and LSH for fast similarity search

How to Build a Text Embedding Pipeline with Sentence Transformers and FAISS

Create a semantic search system that encodes text into vectors and finds similar documents in milliseconds using FAISS indices.

How to Build a Text Entailment and Contradiction Detection Pipeline

Classify whether text pairs agree, contradict, or are unrelated using NLI models and Transformers

How to Build a Text Normalization Pipeline for Noisy Data

Clean messy real-world text with a composable pipeline covering encoding fixes, Unicode normalization, spell correction, and more.

How to Build a Text Paraphrase Pipeline with T5 and PEGASUS

Generate high-quality paraphrases with T5 and PEGASUS, score them by similarity, and batch process text

How to Build a Text Readability Scoring Pipeline with Python

Score any text with multiple readability indices and serve results through a FastAPI endpoint you can deploy today.

How to Build a Text Similarity API with Cross-Encoders

Ship a production-ready text similarity endpoint using cross-encoders that outperform cosine similarity

How to Build a Text Style Transfer Pipeline with Transformers

Rewrite text across styles — formal to casual, technical to simple, passive to active — with working Python pipelines.

How to Build a Text Summarization Pipeline with Sumy and Transformers

Combine extractive summarization with Sumy and abstractive models from Transformers for a hybrid text summarization pipeline

How to Build a Text-to-Knowledge-Graph Pipeline with SpaCy and NetworkX

Turn unstructured text into a knowledge graph you can query and visualize with SpaCy and NetworkX

How to Build an Abstractive Summarization Pipeline with PEGASUS

Create a summarization pipeline that generates concise summaries from long documents with PEGASUS

How to Build an Aspect-Based Sentiment Analysis Pipeline

Go beyond document-level sentiment and analyze what people think about specific aspects of products

How to Build an Emotion Detection Pipeline with GoEmotions and Transformers

Classify text into 27 emotion categories with fine-tuned Transformers and serve predictions via FastAPI

How to Build an Extractive Question Answering System with Transformers

Extract precise answers from documents with transformer-based QA models that point to the exact text span

How to Detect Duplicate and Similar Texts with Embeddings

Build a text deduplication pipeline that scales to millions of documents using datasketch, sentence-transformers, and scikit-learn

How to Extract Keywords and Key Phrases from Text with KeyBERT

Pull the most important terms from any text using embedding-based keyword extraction that actually understands context

How to Parse Document Layouts with LayoutLM and Transformers

Turn scanned documents into structured data with LayoutLM’s combined text, layout, and image understanding

How to Build a Multilingual NLP Pipeline with Sentence Transformers

Encode text in 50+ languages into a shared vector space for search, classification, and similarity scoring.

How to Build a Named Entity Recognition Pipeline with spaCy and Transformers

Extract named entities from any text using spaCy pretrained models, transformer-based NER, and zero-shot GLiNER.

How to Build a RAG Pipeline with Hugging Face Transformers v5

Ground your LLM answers in real documents with a working RAG pipeline you can run locally

How to Build a Semantic Search Engine with Embeddings

Create a search engine that understands meaning, not just keywords, using OpenAI embeddings

How to Build a Sentiment Analysis API with Transformers and FastAPI

Ship a sentiment analysis endpoint in under 100 lines of Python using a fine-tuned RoBERTa model and FastAPI.

How to Build a Text-to-SQL Pipeline with LLMs

Build a pipeline that turns plain English questions into validated SQL queries you can run against any database.

How to Classify Text with Zero-Shot and Few-Shot LLMs

Build a text classification pipeline with LLMs that handles any label set without training data or fine-tuning.

How to Extract Structured Data from PDFs with LLMs

Build a pipeline that parses invoices and receipts from PDF to validated, typed JSON in under 50 lines of Python.

How to Implement Topic Modeling with BERTopic

Build production-ready topic models that actually surface meaningful themes from your text data using BERTopic

How to Summarize Long Documents with LLMs and Map-Reduce

Break long documents into chunks, summarize each one in parallel, and combine the results into a final summary.