The Pipeline at a Glance
Face recognition boils down to three steps: detect a face, convert it into a numerical embedding, and compare that embedding against known faces. InsightFace bundles all three into a single Python library backed by ONNX models. The default buffalo_l model pack includes a face detector (RetinaFace), a 512-dimensional ArcFace recognition model, plus age/gender estimation and 3D landmark prediction.
You feed in an image, get back face objects with bounding boxes, confidence scores, and embeddings. Compare embeddings with cosine similarity, and you have a working recognition system.
Install InsightFace
| |
If you have an NVIDIA GPU and want faster inference, install the GPU runtime instead:
| |
InsightFace requires onnxruntime as its inference backend (MXNet support was dropped after version 0.1.5). The buffalo_l model pack downloads automatically on first use – about 330MB total.
Detect Faces and Extract Embeddings
This is the core workflow. Initialize FaceAnalysis, pass it an image, and inspect the results.
| |
Each face object returned by app.get() carries these attributes:
face.bbox– bounding box as[x1, y1, x2, y2]float arrayface.det_score– detection confidence (0 to 1)face.embedding– 512-dimensional ArcFace vectorface.age– estimated ageface.gender– 0 for female, 1 for maleface.landmark_2d_106– 106-point facial landmarks
Compare Two Faces with Cosine Similarity
ArcFace embeddings live on a hypersphere, so cosine similarity is the right distance metric. Two embeddings from the same person typically score above 0.4, while different people fall well below that.
| |
A threshold of 0.4 works as a reasonable starting point. Tune it based on your use case – lower for recall-heavy applications (unlocking a door should not fail), higher for precision-heavy ones (law enforcement matching needs fewer false positives).
Build a Face Database and Search It
For a practical system, you store embeddings for known people and match incoming faces against them.
| |
For large databases (thousands of faces), replace the linear scan with an approximate nearest neighbor index like FAISS. Linear cosine similarity is fine for up to a few hundred identities.
CPU-Only Mode
If you don’t have a GPU, just install onnxruntime (not onnxruntime-gpu) and set the providers accordingly:
| |
CPU inference on buffalo_l runs at roughly 100-200ms per image on a modern laptop. That is fast enough for batch processing and most non-real-time applications.
Troubleshooting
“No face detected” on valid images
This is the most common issue. The detector works best when faces are at least ~50px wide in the input image. If you are passing cropped or small images, reduce det_size:
| |
You can also try det_size=(128, 128) for headshot-style images where the face fills most of the frame. The detector expects some padding around the face.
Model download fails or hangs
Models download to ~/.insightface/models/buffalo_l/. If the download gets interrupted, you end up with corrupted files. Delete the folder and try again:
| |
onnxruntime import errors
If you see ModuleNotFoundError: No module named 'onnxruntime' after installing insightface, it means the ONNX runtime was not pulled in automatically. Install it explicitly:
| |
On Windows, you might also hit DLL load failed errors with onnxruntime-gpu. Make sure Visual C++ Redistributable 2019+ is installed, and that your CUDA version matches what onnxruntime-gpu expects (check the ONNX Runtime CUDA compatibility table).
GPU not being used
If inference is slow despite having a GPU, check that the CUDA provider is actually loaded:
| |
If CUDAExecutionProvider is missing, you either don’t have onnxruntime-gpu installed, or your CUDA/cuDNN versions don’t match. The safe fix: uninstall both runtimes and install only the GPU one.
| |
Choosing the Right Model Pack
InsightFace ships several model packs. buffalo_l is the default and best for most use cases.
| Model Pack | Detection | Recognition | Size | Notes |
|---|---|---|---|---|
| buffalo_l | RetinaFace | ArcFace-R100 | ~330MB | Best accuracy, recommended |
| buffalo_s | RetinaFace | ArcFace-R18 | ~90MB | Smaller, faster, less accurate |
| buffalo_sc | SCRFD | ArcFace-R18 | ~70MB | Lightweight for edge deployment |
Load a different pack by name:
| |
Licensing Note
InsightFace models are released for non-commercial research purposes only. If you are building a commercial product, you need to either train your own recognition model or purchase a commercial license from the InsightFace team. The Python library code itself is MIT-licensed, but the pretrained model weights carry the non-commercial restriction.
Related Guides
- How to Build an Image Similarity Search with CLIP
- How to Build a Scene Text Recognition Pipeline with PaddleOCR
- How to Build Hand Gesture Recognition with MediaPipe and Python
- How to Build Video Action Recognition with SlowFast and PyTorch
- How to Build Multi-Object Tracking with DeepSORT and YOLOv8
- How to Build a Document Comparison Pipeline with Vision Models
- How to Build a Lane Detection Pipeline with OpenCV and YOLO
- How to Build a Real-Time Pose Estimation Pipeline with MediaPipe
- How to Build a Vehicle Counting Pipeline with YOLOv8 and OpenCV
- How to Build Video Analytics Pipelines with OpenCV and Deep Learning