Quick Start: Detect Pose in an Image
MediaPipe’s PoseLandmarker tracks 33 body landmarks – from nose to ankles – in real time. It runs on CPU, needs no GPU, and works with images, video files, and webcam streams. Here is the fastest path from zero to working pose detection.
Install the dependencies:
| |
Download a model. MediaPipe offers three variants – lite (fastest, ~3MB), full (balanced), and heavy (most accurate, ~30MB). Start with lite:
| |
Run detection on a single image:
| |
That prints normalized coordinates (0.0 to 1.0 relative to image dimensions) for each landmark. The z value represents depth relative to the hip midpoint – negative values mean the landmark is closer to the camera.
Drawing Landmarks on Images
Raw coordinates are useful for math, but you usually want to see the skeleton overlaid on the image. MediaPipe ships drawing utilities in the Tasks API.
| |
The PoseLandmarksConnections.POSE_LANDMARKS constant defines which landmarks connect to form the skeleton. You get lines from shoulder to elbow, elbow to wrist, hip to knee, and so on – 33 points connected into a human stick figure.
Real-Time Webcam Pose Detection
This is where it gets interesting. The LIVE_STREAM running mode processes frames asynchronously through a callback, which keeps your main loop responsive.
| |
Press q to quit. On a modern laptop CPU, the lite model comfortably hits 25-30 FPS. If you need higher accuracy at the cost of speed, swap in the heavy model.
Processing Video Files
For pre-recorded video, use VIDEO mode with detect_for_video. The key difference from live stream mode: you must supply monotonically increasing timestamps, and the call blocks until detection finishes for each frame.
| |
Calculating Joint Angles
Pose landmarks become genuinely useful when you extract angles from them. Fitness apps count reps this way, gesture systems detect arm raises, and rehab tools track range of motion.
The idea is simple: pick three landmarks that form a joint, compute the angle at the middle point using basic trigonometry.
| |
Use it with landmark indices. Here is the full index reference for the joints you will use most:
| Index | Landmark | Index | Landmark |
|---|---|---|---|
| 11 | Left shoulder | 12 | Right shoulder |
| 13 | Left elbow | 14 | Right elbow |
| 15 | Left wrist | 16 | Right wrist |
| 23 | Left hip | 24 | Right hip |
| 25 | Left knee | 26 | Right knee |
| 27 | Left ankle | 28 | Right ankle |
Calculate the right elbow angle (shoulder-elbow-wrist) and right knee angle (hip-knee-ankle):
| |
A straight arm reads close to 180 degrees. A fully bent elbow drops below 40. You can use these thresholds to count bicep curls, detect squats, or flag bad form.
Choosing the Right Model
MediaPipe provides three PoseLandmarker models. Each shares the same 33-landmark output, but they differ in accuracy and speed.
| Model | File | Best For |
|---|---|---|
| Lite | pose_landmarker_lite.task | Real-time apps, mobile, edge devices |
| Full | pose_landmarker_full.task | Balanced accuracy/speed on desktop |
| Heavy | pose_landmarker_heavy.task | Offline processing, max accuracy |
My recommendation: start with lite. It handles most use cases well. Switch to full only if you see jitter on fast movements or missed detections on unusual poses. Heavy is overkill for real-time – save it for batch processing video where latency does not matter.
Download any model by swapping the name in the URL:
| |
Tuning Detection Parameters
The PoseLandmarkerOptions has several confidence thresholds you should understand:
min_pose_detection_confidence(default 0.5) – How confident the detector must be to start tracking a person. Lower it to 0.3 if people are far away or partially occluded. Raise to 0.7 if you get false positives on furniture or backgrounds.min_pose_presence_confidence(default 0.5) – Minimum confidence that a person is still in frame between detections. Lower values keep tracking through brief occlusions.min_tracking_confidence(default 0.5) – Threshold for landmark tracking between frames. Lower values produce smoother output but may track wrong regions. Higher values re-trigger detection more often.num_poses(default 1) – Set higher to detect multiple people. Each additional pose adds processing overhead.
| |
Common Errors and Fixes
RuntimeError: Unable to open file at pose_landmarker_lite.task
You forgot to download the model, or the path is wrong. The model file is not bundled with the mediapipe pip package – you need to download it separately.
| |
If your path contains special characters or non-ASCII characters (common on Windows with OneDrive paths), load the model as a buffer instead:
| |
Timestamp errors in LIVE_STREAM mode
| |
This happens when you send frames with timestamps that do not strictly increase. A common cause is using cv2.getTickCount() or a frame counter that resets. Use wall-clock time instead:
| |
If you are processing frames faster than 1ms apart (unlikely but possible with cached frames), add a small sleep or increment a counter manually.
ValueError: PoseLandmarker is not initialized
You called detect() after the with block closed, or you created the landmarker in one mode and called the wrong detection method. The three modes have matching methods:
IMAGEmode usesdetect()VIDEOmode usesdetect_for_video()LIVE_STREAMmode usesdetect_async()
Calling detect_for_video() on a landmarker created with RunningMode.IMAGE raises a ValueError. Double-check your running mode matches your method call.
No landmarks detected (empty results)
If result.pose_landmarks is an empty list, the model did not find a person. Common causes:
- The person is too small in frame – move the camera closer or crop the image
- Bad lighting or heavy motion blur
- The confidence thresholds are too high – drop
min_pose_detection_confidenceto 0.3 - The image is not in RGB format – OpenCV reads BGR by default, so always convert with
cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)before passing to MediaPipe
ImportError: cannot import name ‘drawing_utils’ from mediapipe
You are likely on an older MediaPipe version. The Tasks API drawing utilities moved in recent versions. Update to the latest:
| |
As of early 2026, version 0.10.32+ includes the Tasks API drawing modules at mediapipe.tasks.python.vision.drawing_utils.
Related Guides
- How to Build a Document Comparison Pipeline with Vision Models
- How to Build a Lane Detection Pipeline with OpenCV and YOLO
- How to Build a Vehicle Counting Pipeline with YOLOv8 and OpenCV
- How to Build a Scene Text Recognition Pipeline with PaddleOCR
- How to Build a Video Shot Boundary Detection Pipeline with PySceneDetect
- How to Build a Video Surveillance Analytics Pipeline with YOLOv8
- How to Build Hand Gesture Recognition with MediaPipe and Python
- How to Build Video Analytics Pipelines with OpenCV and Deep Learning
- How to Build a Receipt Scanner with OCR and Structured Extraction
- How to Build an OCR Pipeline with PaddleOCR and Tesseract