Quick Start: Detect Pose in an Image

MediaPipe’s PoseLandmarker tracks 33 body landmarks – from nose to ankles – in real time. It runs on CPU, needs no GPU, and works with images, video files, and webcam streams. Here is the fastest path from zero to working pose detection.

Install the dependencies:

1
pip install mediapipe opencv-python numpy

Download a model. MediaPipe offers three variants – lite (fastest, ~3MB), full (balanced), and heavy (most accurate, ~30MB). Start with lite:

1
wget -q https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task

Run detection on a single image:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

# Set up the landmarker
base_options = python.BaseOptions(model_asset_path="pose_landmarker_lite.task")
options = vision.PoseLandmarkerOptions(
    base_options=base_options,
    running_mode=vision.RunningMode.IMAGE,
)

with vision.PoseLandmarker.create_from_options(options) as landmarker:
    image = mp.Image.create_from_file("person.jpg")
    result = landmarker.detect(image)

    # Print the first 5 landmarks
    for i, lm in enumerate(result.pose_landmarks[0][:5]):
        print(f"Landmark {i}: x={lm.x:.3f}, y={lm.y:.3f}, z={lm.z:.3f}")

That prints normalized coordinates (0.0 to 1.0 relative to image dimensions) for each landmark. The z value represents depth relative to the hip midpoint – negative values mean the landmark is closer to the camera.

Drawing Landmarks on Images

Raw coordinates are useful for math, but you usually want to see the skeleton overlaid on the image. MediaPipe ships drawing utilities in the Tasks API.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import numpy as np
import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from mediapipe.tasks.python.vision import drawing_utils, drawing_styles

def draw_landmarks_on_image(rgb_image, detection_result):
    annotated = np.copy(rgb_image)
    if not detection_result.pose_landmarks:
        return annotated

    pose_style = drawing_styles.get_default_pose_landmarks_style()
    connection_style = drawing_utils.DrawingSpec(color=(0, 230, 118), thickness=2)

    for pose_landmarks in detection_result.pose_landmarks:
        drawing_utils.draw_landmarks(
            image=annotated,
            landmark_list=pose_landmarks,
            connections=vision.PoseLandmarksConnections.POSE_LANDMARKS,
            landmark_drawing_spec=pose_style,
            connection_drawing_spec=connection_style,
        )
    return annotated

# Detect and draw
base_options = python.BaseOptions(model_asset_path="pose_landmarker_lite.task")
options = vision.PoseLandmarkerOptions(
    base_options=base_options,
    running_mode=vision.RunningMode.IMAGE,
)

with vision.PoseLandmarker.create_from_options(options) as landmarker:
    image = mp.Image.create_from_file("person.jpg")
    result = landmarker.detect(image)
    annotated = draw_landmarks_on_image(image.numpy_view(), result)

    # Convert RGB to BGR for OpenCV and save
    cv2.imwrite("pose_output.jpg", cv2.cvtColor(annotated, cv2.COLOR_RGB2BGR))
    print("Saved pose_output.jpg")

The PoseLandmarksConnections.POSE_LANDMARKS constant defines which landmarks connect to form the skeleton. You get lines from shoulder to elbow, elbow to wrist, hip to knee, and so on – 33 points connected into a human stick figure.

Real-Time Webcam Pose Detection

This is where it gets interesting. The LIVE_STREAM running mode processes frames asynchronously through a callback, which keeps your main loop responsive.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import cv2
import time
import numpy as np
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision
from mediapipe.tasks.python.vision import drawing_utils, drawing_styles

latest_result = None

def on_result(result, output_image, timestamp_ms):
    global latest_result
    latest_result = result

base_options = python.BaseOptions(model_asset_path="pose_landmarker_lite.task")
options = vision.PoseLandmarkerOptions(
    base_options=base_options,
    running_mode=vision.RunningMode.LIVE_STREAM,
    num_poses=1,
    min_pose_detection_confidence=0.5,
    min_tracking_confidence=0.5,
    result_callback=on_result,
)

cap = cv2.VideoCapture(0)
landmarker = vision.PoseLandmarker.create_from_options(options)

try:
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        # MediaPipe expects RGB
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb_frame)

        # Timestamp must increase monotonically
        timestamp_ms = int(time.time() * 1000)
        landmarker.detect_async(mp_image, timestamp_ms)

        # Draw the most recent result
        if latest_result and latest_result.pose_landmarks:
            annotated = np.copy(rgb_frame)
            for pose_landmarks in latest_result.pose_landmarks:
                drawing_utils.draw_landmarks(
                    image=annotated,
                    landmark_list=pose_landmarks,
                    connections=vision.PoseLandmarksConnections.POSE_LANDMARKS,
                    landmark_drawing_spec=drawing_styles.get_default_pose_landmarks_style(),
                    connection_drawing_spec=drawing_utils.DrawingSpec(
                        color=(0, 230, 118), thickness=2
                    ),
                )
            frame = cv2.cvtColor(annotated, cv2.COLOR_RGB2BGR)

        cv2.imshow("Pose Estimation", frame)
        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
finally:
    landmarker.close()
    cap.release()
    cv2.destroyAllWindows()

Press q to quit. On a modern laptop CPU, the lite model comfortably hits 25-30 FPS. If you need higher accuracy at the cost of speed, swap in the heavy model.

Processing Video Files

For pre-recorded video, use VIDEO mode with detect_for_video. The key difference from live stream mode: you must supply monotonically increasing timestamps, and the call blocks until detection finishes for each frame.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import cv2
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

base_options = python.BaseOptions(model_asset_path="pose_landmarker_lite.task")
options = vision.PoseLandmarkerOptions(
    base_options=base_options,
    running_mode=vision.RunningMode.VIDEO,
    num_poses=1,
)

cap = cv2.VideoCapture("input_video.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
frame_idx = 0

with vision.PoseLandmarker.create_from_options(options) as landmarker:
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb)
        timestamp_ms = int(frame_idx * 1000 / fps)

        result = landmarker.detect_for_video(mp_image, timestamp_ms)

        if result.pose_landmarks:
            landmarks = result.pose_landmarks[0]
            nose = landmarks[0]
            print(f"Frame {frame_idx}: nose at ({nose.x:.3f}, {nose.y:.3f})")

        frame_idx += 1

cap.release()

Calculating Joint Angles

Pose landmarks become genuinely useful when you extract angles from them. Fitness apps count reps this way, gesture systems detect arm raises, and rehab tools track range of motion.

The idea is simple: pick three landmarks that form a joint, compute the angle at the middle point using basic trigonometry.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import numpy as np

def calculate_angle(a, b, c):
    """Calculate angle at point b given three landmarks a, b, c.
    Each landmark has .x and .y attributes (normalized coordinates).
    Returns angle in degrees.
    """
    a = np.array([a.x, a.y])
    b = np.array([b.x, b.y])
    c = np.array([c.x, c.y])

    ba = a - b
    bc = c - b

    cosine = np.dot(ba, bc) / (np.linalg.norm(ba) * np.linalg.norm(bc) + 1e-6)
    angle = np.degrees(np.arccos(np.clip(cosine, -1.0, 1.0)))
    return angle

Use it with landmark indices. Here is the full index reference for the joints you will use most:

IndexLandmarkIndexLandmark
11Left shoulder12Right shoulder
13Left elbow14Right elbow
15Left wrist16Right wrist
23Left hip24Right hip
25Left knee26Right knee
27Left ankle28Right ankle

Calculate the right elbow angle (shoulder-elbow-wrist) and right knee angle (hip-knee-ankle):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Assuming `landmarks` is result.pose_landmarks[0]
right_elbow_angle = calculate_angle(
    landmarks[12],  # right shoulder
    landmarks[14],  # right elbow
    landmarks[16],  # right wrist
)

right_knee_angle = calculate_angle(
    landmarks[24],  # right hip
    landmarks[26],  # right knee
    landmarks[28],  # right ankle
)

print(f"Right elbow: {right_elbow_angle:.1f} degrees")
print(f"Right knee: {right_knee_angle:.1f} degrees")

A straight arm reads close to 180 degrees. A fully bent elbow drops below 40. You can use these thresholds to count bicep curls, detect squats, or flag bad form.

Choosing the Right Model

MediaPipe provides three PoseLandmarker models. Each shares the same 33-landmark output, but they differ in accuracy and speed.

ModelFileBest For
Litepose_landmarker_lite.taskReal-time apps, mobile, edge devices
Fullpose_landmarker_full.taskBalanced accuracy/speed on desktop
Heavypose_landmarker_heavy.taskOffline processing, max accuracy

My recommendation: start with lite. It handles most use cases well. Switch to full only if you see jitter on fast movements or missed detections on unusual poses. Heavy is overkill for real-time – save it for batch processing video where latency does not matter.

Download any model by swapping the name in the URL:

1
2
3
4
5
# Full model
wget -q https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_full/float16/1/pose_landmarker_full.task

# Heavy model
wget -q https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_heavy/float16/1/pose_landmarker_heavy.task

Tuning Detection Parameters

The PoseLandmarkerOptions has several confidence thresholds you should understand:

  • min_pose_detection_confidence (default 0.5) – How confident the detector must be to start tracking a person. Lower it to 0.3 if people are far away or partially occluded. Raise to 0.7 if you get false positives on furniture or backgrounds.
  • min_pose_presence_confidence (default 0.5) – Minimum confidence that a person is still in frame between detections. Lower values keep tracking through brief occlusions.
  • min_tracking_confidence (default 0.5) – Threshold for landmark tracking between frames. Lower values produce smoother output but may track wrong regions. Higher values re-trigger detection more often.
  • num_poses (default 1) – Set higher to detect multiple people. Each additional pose adds processing overhead.
1
2
3
4
5
6
7
8
9
options = vision.PoseLandmarkerOptions(
    base_options=python.BaseOptions(model_asset_path="pose_landmarker_full.task"),
    running_mode=vision.RunningMode.LIVE_STREAM,
    num_poses=3,
    min_pose_detection_confidence=0.4,
    min_pose_presence_confidence=0.4,
    min_tracking_confidence=0.3,
    result_callback=on_result,
)

Common Errors and Fixes

RuntimeError: Unable to open file at pose_landmarker_lite.task

You forgot to download the model, or the path is wrong. The model file is not bundled with the mediapipe pip package – you need to download it separately.

1
2
# Fix: download the model file
wget -q https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task

If your path contains special characters or non-ASCII characters (common on Windows with OneDrive paths), load the model as a buffer instead:

1
2
3
4
with open("path/to/pose_landmarker_lite.task", "rb") as f:
    model_buffer = f.read()

base_options = python.BaseOptions(model_asset_buffer=model_buffer)

Timestamp errors in LIVE_STREAM mode

1
RuntimeError: The input timestamp must be monotonically increasing.

This happens when you send frames with timestamps that do not strictly increase. A common cause is using cv2.getTickCount() or a frame counter that resets. Use wall-clock time instead:

1
2
import time
timestamp_ms = int(time.time() * 1000)

If you are processing frames faster than 1ms apart (unlikely but possible with cached frames), add a small sleep or increment a counter manually.

ValueError: PoseLandmarker is not initialized

You called detect() after the with block closed, or you created the landmarker in one mode and called the wrong detection method. The three modes have matching methods:

  • IMAGE mode uses detect()
  • VIDEO mode uses detect_for_video()
  • LIVE_STREAM mode uses detect_async()

Calling detect_for_video() on a landmarker created with RunningMode.IMAGE raises a ValueError. Double-check your running mode matches your method call.

No landmarks detected (empty results)

If result.pose_landmarks is an empty list, the model did not find a person. Common causes:

  • The person is too small in frame – move the camera closer or crop the image
  • Bad lighting or heavy motion blur
  • The confidence thresholds are too high – drop min_pose_detection_confidence to 0.3
  • The image is not in RGB format – OpenCV reads BGR by default, so always convert with cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) before passing to MediaPipe

ImportError: cannot import name ‘drawing_utils’ from mediapipe

You are likely on an older MediaPipe version. The Tasks API drawing utilities moved in recent versions. Update to the latest:

1
pip install --upgrade mediapipe

As of early 2026, version 0.10.32+ includes the Tasks API drawing modules at mediapipe.tasks.python.vision.drawing_utils.