MediaPipe’s face detection runs on CPU at 30+ FPS and works out of the box with two lines of setup. No GPU required, no model downloads, no ONNX runtime headaches. You get bounding boxes, confidence scores, and six facial keypoints per face. Here’s how to wire it up with OpenCV for real-time webcam detection.
| |
That pulls in everything you need. MediaPipe bundles its own TFLite models internally, so there’s no separate weight file to manage.
How MediaPipe Face Detection Works
MediaPipe ships two face detection models under the hood, and you pick one with a single integer:
- Model 0 (short-range) — optimized for faces within 2 meters of the camera. Best for selfie-style webcam setups. This is the default.
- Model 1 (full-range) — handles faces up to 5 meters away. Use this for surveillance-style cameras or when people are further from the lens.
Both models return a bounding box and six keypoints per detected face: right eye, left eye, nose tip, mouth center, right ear tragion, and left ear tragion. The detection pipeline runs a single-shot detector (SSD) with a custom encoder, which is why it’s so fast on CPU.
Here’s a minimal script that detects faces in a single image:
| |
Feed it any image with a face and you’ll get bounding boxes drawn on the output. The model_selection=0 picks the short-range model. Swap to 1 if your subjects are far from the camera.
Real-Time Webcam Face Detection
The single-image version is useful for testing, but the real payoff is live detection. Here’s a complete webcam loop that draws bounding boxes and confidence scores on every frame:
| |
Press q to quit. The confidence score floats above each bounding box so you can see how sure the model is about each detection.
Extracting Face Keypoints
Beyond bounding boxes, each detection gives you six facial landmarks. These are normalized coordinates (0.0 to 1.0), so you multiply by frame dimensions to get pixel positions. This is useful for head pose estimation, gaze direction, or feeding crops into a face recognition model.
| |
Each keypoint is a RelativeKeypoint with .x and .y fields. The six points give you enough geometry to estimate rough head orientation or to align face crops before passing them to a recognition model.
Performance Tips
MediaPipe face detection is already fast, but you can squeeze more out of it for edge deployments or older hardware.
Reduce resolution. The model internally resizes input anyway. Feeding 640x480 instead of 1920x1080 cuts preprocessing time significantly:
| |
Skip frames. If you don’t need detection on every single frame, process every Nth frame and reuse the last result for the skipped ones:
| |
This processes every 3rd frame while still displaying all frames smoothly. You trade a bit of detection latency for a ~3x reduction in compute.
Raise the confidence threshold. Setting min_detection_confidence=0.7 or higher reduces false positives and slightly speeds up post-processing since fewer detections get returned. The default 0.5 is a good balance, but bump it up if you’re getting phantom faces in complex backgrounds.
Use model 0 unless you need range. The short-range model is faster than full-range. Only switch to model_selection=1 when your camera is mounted far from people.
Common Errors and Fixes
cv2.error: ... Can't open camera by index — Your webcam isn’t available at index 0. Try cv2.VideoCapture(1) or cv2.VideoCapture(2). On Linux, check that /dev/video0 exists. If you’re in a VM or WSL, webcam passthrough may need extra configuration.
AttributeError: 'NoneType' object has no attribute 'detections' — This happens when you call face_detection.process() with a BGR image instead of RGB. Always convert with cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) before processing. MediaPipe expects RGB input.
Detections flicker or disappear between frames — Lower the min_detection_confidence to 0.3 or 0.4. The default 0.5 can miss faces at oblique angles or in poor lighting. You can also try switching to model_selection=1 for better detection at distance.
ImportError: cannot import name 'solutions' from 'mediapipe' — You likely have a stale or partial install. Run pip install --upgrade mediapipe to get the latest version. If you’re on Apple Silicon, make sure you’re using Python 3.9+ since older builds had compatibility issues.
Bounding boxes are offset or misaligned — The relative bounding box coordinates assume the image hasn’t been flipped. If you’re using cv2.flip(frame, 1) for a mirror effect, apply the flip before passing the frame to MediaPipe, not after. Otherwise the coordinates won’t match the displayed frame.
Related Guides
- How to Build License Plate Detection and Recognition with YOLO and OCR
- How to Build a Color Detection and Extraction Pipeline with OpenCV
- How to Build a Video Object Removal Pipeline with Inpainting
- How to Build Hand Gesture Recognition with MediaPipe and Python
- How to Build a Barcode and QR Code Scanner with OpenCV and ZBar
- How to Build a Panoramic Image Stitching Pipeline with OpenCV
- How to Build Document Scanner with OpenCV and Perspective Transform
- How to Build Real-Time Object Segmentation with SAM 2 and WebSocket
- How to Build a Lane Detection Pipeline with OpenCV and YOLO
- How to Build a Real-Time Pose Estimation Pipeline with MediaPipe