Computer Vision

How to Run InternVL3 Locally for Multimodal Document Understanding

Set up InternVL3 on your own hardware for document understanding tasks—OCR, tables, charts—with practical code and quantization options for consumer GPUs.

How to Build a Barcode and QR Code Scanner with OpenCV and ZBar

Detect, decode, and annotate barcodes and QR codes from images and webcams with Python and OpenCV

How to Build a Color Detection and Extraction Pipeline with OpenCV

Build a color extraction pipeline that identifies dominant colors, matches named colors, and creates palette swatches from any image.

How to Build a Document Comparison Pipeline with Vision Models

Detect layout changes, text edits, and added or removed sections between two document versions using OpenCV and PaddleOCR

How to Build a Document Table Extraction Pipeline with Vision Models

Pull structured table data from PDFs and images using Table Transformer and OpenCV preprocessing

How to Build a Lane Detection Pipeline with OpenCV and YOLO

Detect and overlay driving lanes in video feeds with OpenCV classical methods and YOLOv8 segmentation models

How to Build a Panoramic Image Stitching Pipeline with OpenCV

Stitch images into panoramas with OpenCV’s high-level Stitcher and a manual pipeline using feature matching, homography, and blending.

How to Build a Product Defect Detector with YOLOv8 and OpenCV

Build an end-to-end defect detection system from labeled images to a REST API using YOLOv8 and FastAPI

How to Build a Receipt Scanner with OCR and Structured Extraction

Turn receipt photos into structured data with PaddleOCR, OpenCV preprocessing, and regex-based field extraction

How to Build a Scene Text Recognition Pipeline with PaddleOCR

Read text from real-world photos using PaddleOCR’s detection and recognition models with confidence filtering and batch processing

How to Build a Vehicle Counting Pipeline with YOLOv8 and OpenCV

Count cars, trucks, and buses crossing a virtual line in video with YOLOv8 and centroid tracking

How to Build a Video Frame Interpolation Pipeline with RIFE

Build a frame interpolation pipeline that doubles or quadruples video FPS using the RIFE model

How to Build a Video Object Removal Pipeline with Inpainting

Track and remove objects from video frames using segmentation masks and temporal-aware inpainting.

How to Build a Video Shot Boundary Detection Pipeline with PySceneDetect

Split videos into individual scenes automatically with PySceneDetect’s content and threshold detectors

How to Build a Video Surveillance Analytics Pipeline with YOLOv8

Detect people and vehicles, monitor zones, count entries, and generate heatmaps from video feeds with Python

How to Build a Visual Grounding Pipeline with Grounding DINO

Detect any object in images using text prompts with Grounding DINO and zero training data

How to Build a Visual Inspection Pipeline with Anomaly Detection

Create a production-ready visual inspection pipeline using PatchCore and OpenCV for manufacturing QA

How to Build a Wildlife Camera Trap Classifier with YOLOv8 and FastAPI

Build a wildlife classifier that spots animals in camera trap photos and serves results over HTTP

How to Build an Image Captioning Pipeline with BLIP and Transformers

Generate accurate image descriptions with BLIP models using a production-ready captioning pipeline in Python

How to Build an OCR Pipeline with PaddleOCR and Tesseract

Extract text from images and scanned documents using PaddleOCR and Tesseract with confidence scoring and batch processing

How to Build Document Scanner with OpenCV and Perspective Transform

Turn photos of documents into clean, flat scans using OpenCV perspective warping in Python

How to Build Hand Gesture Recognition with MediaPipe and Python

Detect and classify hand gestures in real time with MediaPipe landmarks and a simple rule-based classifier

How to Build License Plate Detection and Recognition with YOLO and OCR

Create an automatic license plate reader with YOLO detection, OCR text extraction, and real-time video support in Python

How to Build Medical Image Classification with CheXNet and PyTorch

Train a DenseNet-121 model to detect 14 chest X-ray pathologies and visualize predictions with Grad-CAM attention maps.

How to Build Multi-Object Tracking with DeepSORT and YOLOv8

Track multiple objects across video frames by pairing YOLOv8 detections with DeepSORT identity matching

How to Build Optical Flow Estimation with RAFT and PyTorch

Track pixel-level motion between frames using RAFT optical flow from torchvision in a few lines of code

How to Build Real-Time Face Detection with MediaPipe and Python

Detect faces in real time from webcam feeds using MediaPipe’s blazing-fast face detection models in Python.

How to Build Real-Time Object Segmentation with SAM 2 and WebSocket

Stream pixel-perfect segmentation masks to the browser in real time using SAM 2 and WebSocket connections

How to Build Semantic Segmentation with Segment Anything and SAM 2

Segment every object in images and video frames using SAM 2 automatic masks, point prompts, and box prompts

How to Build Video Action Recognition with SlowFast and PyTorch

Classify human actions in video clips and live streams using SlowFast dual-pathway networks in PyTorch

How to Build Video Analytics Pipelines with OpenCV and Deep Learning

Process live video feeds with object detection, tracking, and zone-based analytics using Python and OpenCV

How to Build Visual Question Answering with BLIP-2 and InstructBLIP

Learn to create powerful image Q&A systems that understand product photos, medical scans, and documents using pretrained vision-language models.

How to Classify Documents with Vision Models and DiT

Build a document classification pipeline that sorts invoices, receipts, contracts, and more

How to Detect Anomalies in Images with Vision Models

Find defects and anomalies in images using deep learning without needing large labeled defect datasets

How to Upscale and Enhance Images with AI Super Resolution

Turn low-resolution images into sharp high-res versions using neural network upscalers that add realistic detail

How to Build a Face Recognition System with InsightFace and Python

Detect, embed, and match faces in Python with InsightFace’s buffalo_l model and a few lines of code

How to Build a Real-Time Pose Estimation Pipeline with MediaPipe

Detect body landmarks, draw skeletons, and calculate joint angles with MediaPipe and a webcam

How to Build an Image Similarity Search with CLIP

Search images by text or visual similarity using CLIP embeddings and a FAISS vector index

How to Classify Images with Vision Transformers in PyTorch

Load a Vision Transformer, preprocess images, run inference, and fine-tune on your own dataset

How to Detect Objects in Images with YOLOv8

Set up YOLOv8 for image and video object detection with just a few lines of Python

How to Estimate Depth from Images with Depth Anything V2

Turn any photo into a depth map with Depth Anything V2 using three lines of Python or full manual control

How to Extract Text from Images with Vision LLMs

Replace Tesseract with vision LLMs that read messy documents, handwriting, and tables accurately

How to Segment Images with SAM 2 in Python

Use SAM 2 to cut out objects from images with clicks or bounding boxes in a few lines of Python

How to Track Objects in Video with ByteTrack

Track and re-identify objects across video frames with ByteTrack, YOLO detection, and the supervision library

How to Train a Custom Object Detection Model with Ultralytics

Step-by-step guide to training your own YOLO object detection model on custom data using the Ultralytics Python API and CLI.