The Pipeline at a Glance

Face recognition boils down to three steps: detect a face, convert it into a numerical embedding, and compare that embedding against known faces. InsightFace bundles all three into a single Python library backed by ONNX models. The default buffalo_l model pack includes a face detector (RetinaFace), a 512-dimensional ArcFace recognition model, plus age/gender estimation and 3D landmark prediction.

You feed in an image, get back face objects with bounding boxes, confidence scores, and embeddings. Compare embeddings with cosine similarity, and you have a working recognition system.

Install InsightFace

1
pip install insightface onnxruntime

If you have an NVIDIA GPU and want faster inference, install the GPU runtime instead:

1
pip install insightface onnxruntime-gpu

InsightFace requires onnxruntime as its inference backend (MXNet support was dropped after version 0.1.5). The buffalo_l model pack downloads automatically on first use – about 330MB total.

Detect Faces and Extract Embeddings

This is the core workflow. Initialize FaceAnalysis, pass it an image, and inspect the results.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import cv2
import numpy as np
from insightface.app import FaceAnalysis

# Initialize the face analysis pipeline
# Models download to ~/.insightface/models/ on first run
app = FaceAnalysis(providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
app.prepare(ctx_id=0, det_size=(640, 640))

# Load an image (BGR format, as OpenCV reads it)
img = cv2.imread("photo.jpg")
faces = app.get(img)

print(f"Found {len(faces)} face(s)")

for i, face in enumerate(faces):
    # Bounding box: [x1, y1, x2, y2]
    bbox = face.bbox.astype(int)
    print(f"Face {i}: bbox={bbox}, score={face.det_score:.3f}")

    # 512-D embedding vector (the ArcFace representation)
    embedding = face.embedding  # shape: (512,)
    print(f"  Embedding norm: {np.linalg.norm(embedding):.2f}")

    # Bonus attributes
    print(f"  Age: {face.age}, Gender: {'M' if face.gender == 1 else 'F'}")

Each face object returned by app.get() carries these attributes:

  • face.bbox – bounding box as [x1, y1, x2, y2] float array
  • face.det_score – detection confidence (0 to 1)
  • face.embedding – 512-dimensional ArcFace vector
  • face.age – estimated age
  • face.gender – 0 for female, 1 for male
  • face.landmark_2d_106 – 106-point facial landmarks

Compare Two Faces with Cosine Similarity

ArcFace embeddings live on a hypersphere, so cosine similarity is the right distance metric. Two embeddings from the same person typically score above 0.4, while different people fall well below that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import cv2
import numpy as np
from insightface.app import FaceAnalysis

app = FaceAnalysis(providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
app.prepare(ctx_id=0, det_size=(640, 640))

def get_embedding(image_path):
    """Extract the face embedding from an image. Raises if no face found."""
    img = cv2.imread(image_path)
    if img is None:
        raise FileNotFoundError(f"Cannot read image: {image_path}")
    faces = app.get(img)
    if len(faces) == 0:
        raise ValueError(f"No face detected in {image_path}")
    # Take the face with the highest detection score
    faces_sorted = sorted(faces, key=lambda f: f.det_score, reverse=True)
    return faces_sorted[0].embedding

def cosine_similarity(a, b):
    """Compute cosine similarity between two vectors."""
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Compare two photos
emb1 = get_embedding("person_a_photo1.jpg")
emb2 = get_embedding("person_a_photo2.jpg")
emb3 = get_embedding("person_b.jpg")

sim_same = cosine_similarity(emb1, emb2)
sim_diff = cosine_similarity(emb1, emb3)

print(f"Same person: {sim_same:.4f}")   # Typically 0.5 - 0.8+
print(f"Different person: {sim_diff:.4f}")  # Typically < 0.3

THRESHOLD = 0.4
print(f"Match: {sim_same > THRESHOLD}")  # True
print(f"Match: {sim_diff > THRESHOLD}")  # False

A threshold of 0.4 works as a reasonable starting point. Tune it based on your use case – lower for recall-heavy applications (unlocking a door should not fail), higher for precision-heavy ones (law enforcement matching needs fewer false positives).

Build a Face Database and Search It

For a practical system, you store embeddings for known people and match incoming faces against them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
import os
import cv2
import numpy as np
from insightface.app import FaceAnalysis

app = FaceAnalysis(providers=["CUDAExecutionProvider", "CPUExecutionProvider"])
app.prepare(ctx_id=0, det_size=(640, 640))

def build_database(directory):
    """Build a face database from a directory of images.

    Expected structure:
        known_faces/
            alice.jpg
            bob.jpg
            carol.jpg
    """
    db = {}
    for filename in os.listdir(directory):
        if not filename.lower().endswith((".jpg", ".jpeg", ".png")):
            continue
        path = os.path.join(directory, filename)
        img = cv2.imread(path)
        faces = app.get(img)
        if len(faces) == 0:
            print(f"Warning: no face in {filename}, skipping")
            continue
        name = os.path.splitext(filename)[0]
        best_face = max(faces, key=lambda f: f.det_score)
        db[name] = best_face.embedding
    print(f"Database: {len(db)} people registered")
    return db

def identify(image_path, db, threshold=0.4):
    """Identify all faces in an image against the database."""
    img = cv2.imread(image_path)
    faces = app.get(img)
    results = []

    for face in faces:
        best_match = None
        best_score = -1

        for name, ref_emb in db.items():
            score = np.dot(face.embedding, ref_emb) / (
                np.linalg.norm(face.embedding) * np.linalg.norm(ref_emb)
            )
            if score > best_score:
                best_score = score
                best_match = name

        label = best_match if best_score > threshold else "Unknown"
        results.append({
            "label": label,
            "score": float(best_score),
            "bbox": face.bbox.astype(int).tolist(),
        })
    return results

# Usage
db = build_database("known_faces/")
matches = identify("group_photo.jpg", db)
for m in matches:
    print(f"{m['label']} (score: {m['score']:.3f}) at {m['bbox']}")

For large databases (thousands of faces), replace the linear scan with an approximate nearest neighbor index like FAISS. Linear cosine similarity is fine for up to a few hundred identities.

CPU-Only Mode

If you don’t have a GPU, just install onnxruntime (not onnxruntime-gpu) and set the providers accordingly:

1
2
app = FaceAnalysis(providers=["CPUExecutionProvider"])
app.prepare(ctx_id=0, det_size=(640, 640))

CPU inference on buffalo_l runs at roughly 100-200ms per image on a modern laptop. That is fast enough for batch processing and most non-real-time applications.

Troubleshooting

“No face detected” on valid images

This is the most common issue. The detector works best when faces are at least ~50px wide in the input image. If you are passing cropped or small images, reduce det_size:

1
2
# Try smaller detection resolution for close-up or small images
app.prepare(ctx_id=0, det_size=(320, 320))

You can also try det_size=(128, 128) for headshot-style images where the face fills most of the frame. The detector expects some padding around the face.

Model download fails or hangs

Models download to ~/.insightface/models/buffalo_l/. If the download gets interrupted, you end up with corrupted files. Delete the folder and try again:

1
2
rm -rf ~/.insightface/models/buffalo_l
python -c "from insightface.app import FaceAnalysis; FaceAnalysis()"

onnxruntime import errors

If you see ModuleNotFoundError: No module named 'onnxruntime' after installing insightface, it means the ONNX runtime was not pulled in automatically. Install it explicitly:

1
pip install onnxruntime

On Windows, you might also hit DLL load failed errors with onnxruntime-gpu. Make sure Visual C++ Redistributable 2019+ is installed, and that your CUDA version matches what onnxruntime-gpu expects (check the ONNX Runtime CUDA compatibility table).

GPU not being used

If inference is slow despite having a GPU, check that the CUDA provider is actually loaded:

1
2
3
import onnxruntime
print(onnxruntime.get_available_providers())
# Should include 'CUDAExecutionProvider' if GPU is available

If CUDAExecutionProvider is missing, you either don’t have onnxruntime-gpu installed, or your CUDA/cuDNN versions don’t match. The safe fix: uninstall both runtimes and install only the GPU one.

1
2
pip uninstall onnxruntime onnxruntime-gpu
pip install onnxruntime-gpu

Choosing the Right Model Pack

InsightFace ships several model packs. buffalo_l is the default and best for most use cases.

Model PackDetectionRecognitionSizeNotes
buffalo_lRetinaFaceArcFace-R100~330MBBest accuracy, recommended
buffalo_sRetinaFaceArcFace-R18~90MBSmaller, faster, less accurate
buffalo_scSCRFDArcFace-R18~70MBLightweight for edge deployment

Load a different pack by name:

1
2
app = FaceAnalysis(name="buffalo_s", providers=["CPUExecutionProvider"])
app.prepare(ctx_id=0, det_size=(640, 640))

Licensing Note

InsightFace models are released for non-commercial research purposes only. If you are building a commercial product, you need to either train your own recognition model or purchase a commercial license from the InsightFace team. The Python library code itself is MIT-licensed, but the pretrained model weights carry the non-commercial restriction.