OpenCV ships with a high-level cv2.Stitcher class that handles panorama creation in about five lines. It also gives you every building block to do it manually — feature detection, matching, homography estimation, warping, and blending. Knowing both paths matters. The high-level API is great until it fails silently on tricky inputs, and then you need to understand what’s happening underneath.

Here’s the fastest way to stitch two or more images:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import cv2
import glob

image_paths = sorted(glob.glob("images/pano_*.jpg"))
images = [cv2.imread(p) for p in image_paths]

stitcher = cv2.Stitcher.create(mode=cv2.Stitcher_PANORAMA)
status, panorama = stitcher.stitch(images)

if status == cv2.Stitcher_OK:
    cv2.imwrite("panorama_output.jpg", panorama)
    print(f"Panorama saved: {panorama.shape[1]}x{panorama.shape[0]} pixels")
else:
    error_map = {
        cv2.Stitcher_ERR_NEED_MORE_IMGS: "Need more images",
        cv2.Stitcher_ERR_HOMOGRAPHY_EST_FAIL: "Homography estimation failed",
        cv2.Stitcher_ERR_CAMERA_PARAMS_ADJUST_FAIL: "Camera params adjustment failed",
    }
    print(f"Stitching failed: {error_map.get(status, f'Unknown error {status}')}")

The mode parameter matters. cv2.Stitcher_PANORAMA assumes the camera rotates around a fixed point (like standing and turning). cv2.Stitcher_SCANS works for flat document scanning where images are taken from roughly the same plane.

Manual Feature-Based Stitching Pipeline

The high-level API hides a lot. When you need control over feature detection, matching thresholds, or blending strategy, build the pipeline yourself. This example stitches two images step by step.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import cv2
import numpy as np

def stitch_pair(img1, img2, feature_method="orb"):
    """Stitch two images using feature matching and homography."""
    gray1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
    gray2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

    # Feature detection — ORB is free, SIFT needs opencv-contrib
    if feature_method == "sift":
        detector = cv2.SIFT_create(nfeatures=2000)
        norm_type = cv2.NORM_L2
    else:
        detector = cv2.ORB_create(nfeatures=3000)
        norm_type = cv2.NORM_HAMMING

    kp1, des1 = detector.detectAndCompute(gray1, None)
    kp2, des2 = detector.detectAndCompute(gray2, None)

    if des1 is None or des2 is None:
        raise ValueError("No descriptors found in one of the images")

    # Match features with BFMatcher + ratio test
    bf = cv2.BFMatcher(norm_type, crossCheck=False)
    raw_matches = bf.knnMatch(des1, des2, k=2)

    # Lowe's ratio test — 0.75 is a solid default
    good_matches = []
    for m, n in raw_matches:
        if m.distance < 0.75 * n.distance:
            good_matches.append(m)

    print(f"Features: {len(kp1)} / {len(kp2)} | Good matches: {len(good_matches)}")

    if len(good_matches) < 10:
        raise ValueError(f"Only {len(good_matches)} matches — need at least 10 for a reliable homography")

    # Extract matched point coordinates
    src_pts = np.float32([kp1[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
    dst_pts = np.float32([kp2[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)

    # Estimate homography with RANSAC
    H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, ransacReprojThreshold=5.0)
    inliers = mask.ravel().sum()
    print(f"RANSAC inliers: {inliers} / {len(good_matches)}")

    # Warp img1 into img2's coordinate frame
    h1, w1 = img1.shape[:2]
    h2, w2 = img2.shape[:2]

    # Calculate output canvas size
    corners_img1 = np.float32([[0, 0], [w1, 0], [w1, h1], [0, h1]]).reshape(-1, 1, 2)
    warped_corners = cv2.perspectiveTransform(corners_img1, H)
    all_corners = np.concatenate([warped_corners, np.float32([[0, 0], [w2, 0], [w2, h2], [0, h2]]).reshape(-1, 1, 2)])

    x_min, y_min = np.int32(all_corners.min(axis=0).ravel())
    x_max, y_max = np.int32(all_corners.max(axis=0).ravel())

    # Translation matrix to shift everything into positive coordinates
    translation = np.array([[1, 0, -x_min], [0, 1, -y_min], [0, 0, 1]], dtype=np.float64)
    canvas_size = (x_max - x_min, y_max - y_min)

    warped1 = cv2.warpPerspective(img1, translation @ H, canvas_size)

    # Place img2 onto the canvas
    warped1[-y_min:-y_min + h2, -x_min:-x_min + w2] = img2

    return warped1


# Usage
img1 = cv2.imread("left.jpg")
img2 = cv2.imread("right.jpg")
result = stitch_pair(img1, img2, feature_method="orb")
cv2.imwrite("manual_panorama.jpg", result)

A few things worth noting: ORB is fast and patent-free, but SIFT produces more stable matches on images with significant scale changes. The RANSAC reprojection threshold of 5.0 pixels is a reasonable starting point — lower values are stricter, higher values tolerate more geometric distortion.

FLANN-Based Matching for Larger Image Sets

When you’re matching thousands of features, BFMatcher gets slow. FLANN (Fast Library for Approximate Nearest Neighbors) is significantly faster for large descriptor sets:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def flann_match(des1, des2, feature_method="sift"):
    """Match descriptors using FLANN for speed on large feature sets."""
    if feature_method == "sift":
        index_params = dict(algorithm=1, trees=5)  # FLANN_INDEX_KDTREE
    else:
        index_params = dict(algorithm=6, table_number=6, key_size=12, multi_probe_level=1)  # FLANN_INDEX_LSH for binary descriptors

    search_params = dict(checks=50)
    flann = cv2.FlannBasedMatcher(index_params, search_params)
    matches = flann.knnMatch(des1, des2, k=2)

    good = [m for m, n in matches if m.distance < 0.75 * n.distance]
    return good

Use FLANN with SIFT descriptors (float vectors) for best results. For ORB’s binary descriptors, the LSH index works but BFMatcher is often just as fast.

Handling Exposure Differences with Multi-Band Blending

The naive approach of just overwriting pixels in the overlap region creates a visible seam. Multi-band blending fixes this by blending low-frequency content (broad lighting) gradually while keeping high-frequency content (edges, texture) sharp.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def multi_band_blend(img1_warped, img2_warped, mask1, mask2, num_bands=5):
    """Blend two warped images using Laplacian pyramid multi-band blending."""
    # Build Gaussian pyramids for masks
    gp_mask1 = [mask1.astype(np.float32) / 255.0]
    gp_mask2 = [mask2.astype(np.float32) / 255.0]
    for i in range(num_bands):
        gp_mask1.append(cv2.pyrDown(gp_mask1[i]))
        gp_mask2.append(cv2.pyrDown(gp_mask2[i]))

    # Build Laplacian pyramids for images
    def laplacian_pyramid(img, levels):
        gp = [img.astype(np.float64)]
        for i in range(levels):
            gp.append(cv2.pyrDown(gp[i]))
        lp = [gp[levels]]
        for i in range(levels, 0, -1):
            expanded = cv2.pyrUp(gp[i], dstsize=(gp[i - 1].shape[1], gp[i - 1].shape[0]))
            lp.append(gp[i - 1] - expanded)
        return lp[::-1]

    lp1 = laplacian_pyramid(img1_warped, num_bands)
    lp2 = laplacian_pyramid(img2_warped, num_bands)

    # Blend each pyramid level
    blended_pyramid = []
    for l1, l2, m1, m2 in zip(lp1, lp2, gp_mask1, gp_mask2):
        m1_3ch = np.stack([m1] * 3, axis=-1) if m1.ndim == 2 else m1
        m2_3ch = np.stack([m2] * 3, axis=-1) if m2.ndim == 2 else m2
        weight_sum = m1_3ch + m2_3ch + 1e-8
        blended = (l1 * m1_3ch + l2 * m2_3ch) / weight_sum
        blended_pyramid.append(blended)

    # Reconstruct from blended pyramid
    result = blended_pyramid[0]
    for i in range(1, len(blended_pyramid)):
        result = cv2.pyrUp(result, dstsize=(blended_pyramid[i].shape[1], blended_pyramid[i].shape[0]))
        result += blended_pyramid[i]

    return np.clip(result, 0, 255).astype(np.uint8)

Five bands is a good default. More bands give smoother blending but cost more memory. For drone imagery or outdoor shots where lighting varies a lot across frames, multi-band blending makes a night-and-day difference compared to simple averaging.

Processing Video Frames into Panoramas

Stitching panoramas from video is useful for surveillance, drone footage, and creating wide-angle views from a panning camera. The trick is selecting frames with enough overlap (30-50%) without too much redundancy.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import cv2

def extract_panorama_frames(video_path, frame_interval=15, max_frames=20):
    """Extract evenly-spaced frames from video for stitching."""
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        raise FileNotFoundError(f"Cannot open video: {video_path}")

    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    fps = cap.get(cv2.CAP_PROP_FPS)
    print(f"Video: {total_frames} frames, {fps:.1f} FPS, {total_frames / fps:.1f}s duration")

    frames = []
    frame_idx = 0

    while cap.isOpened() and len(frames) < max_frames:
        cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
        ret, frame = cap.read()
        if not ret:
            break
        frames.append(frame)
        frame_idx += frame_interval

    cap.release()
    print(f"Extracted {len(frames)} frames")
    return frames


def stitch_video_panorama(video_path, frame_interval=15):
    """Build a panorama from video frames."""
    frames = extract_panorama_frames(video_path, frame_interval)

    # Downscale for speed if frames are large
    scale = 1.0
    if frames[0].shape[1] > 1920:
        scale = 1920 / frames[0].shape[1]
        frames = [cv2.resize(f, None, fx=scale, fy=scale) for f in frames]

    stitcher = cv2.Stitcher.create(cv2.Stitcher_PANORAMA)
    stitcher.setPanoConfidenceThresh(0.8)  # Lower threshold = more tolerant

    status, panorama = stitcher.stitch(frames)
    if status == cv2.Stitcher_OK:
        return panorama
    else:
        # Try with fewer frames — sometimes too many overlapping frames confuse the stitcher
        print(f"Failed with {len(frames)} frames, retrying with every other frame")
        status, panorama = stitcher.stitch(frames[::2])
        if status == cv2.Stitcher_OK:
            return panorama
        raise RuntimeError(f"Stitching failed with status {status}")


panorama = stitch_video_panorama("drone_pan.mp4", frame_interval=20)
cv2.imwrite("video_panorama.jpg", panorama)

The frame_interval parameter depends on camera speed. For a slow pan, every 15th-20th frame works. For fast motion, grab every 5th-8th frame. If the stitcher fails, try fewer frames first — too many near-duplicate frames actually hurt more than help.

Performance Tips for Large Image Sets

Stitching gets expensive fast when you’re working with 10+ high-resolution images. Here are concrete ways to speed things up:

Downscale for feature detection, full-res for final warp. Detect and match features on images scaled to 25-50% of original resolution. Compute the homography on downscaled coordinates, then scale the translation components to apply to full-resolution images.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def fast_homography(img1, img2, work_scale=0.5):
    """Compute homography on downscaled images for speed."""
    small1 = cv2.resize(img1, None, fx=work_scale, fy=work_scale)
    small2 = cv2.resize(img2, None, fx=work_scale, fy=work_scale)

    detector = cv2.ORB_create(nfeatures=2000)
    kp1, des1 = detector.detectAndCompute(cv2.cvtColor(small1, cv2.COLOR_BGR2GRAY), None)
    kp2, des2 = detector.detectAndCompute(cv2.cvtColor(small2, cv2.COLOR_BGR2GRAY), None)

    bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
    matches = bf.knnMatch(des1, des2, k=2)
    good = [m for m, n in matches if m.distance < 0.75 * n.distance]

    src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
    dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)

    H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)

    # Scale homography to full resolution
    S = np.diag([1.0 / work_scale, 1.0 / work_scale, 1.0])
    S_inv = np.diag([work_scale, work_scale, 1.0])
    H_full = S @ H @ S_inv

    return H_full

Other things that matter:

  • Limit ORB features to 2000-3000. More features means more matching time with diminishing returns on accuracy.
  • Use cv2.UMat for GPU-accelerated operations if OpenCV was built with OpenCL support. Wrap images with cv2.UMat(img) and the same API calls run on the GPU.
  • Stitch incrementally. Don’t try to match all pairs at once. Stitch pairs into sub-panoramas, then stitch those together. This reduces accumulated warping distortion and memory usage.
  • Pre-crop black borders after each stitching step to avoid growing canvas sizes full of empty pixels.

Common Errors and Fixes

cv2.error: (-215:Assertion failed) !_src.empty() — One of your images failed to load. Check that the file paths are correct and the images are valid. cv2.imread returns None silently on failure, so always verify: assert img is not None, f"Failed to load {path}".

Stitcher_ERR_NEED_MORE_IMGS — The stitcher couldn’t find enough overlap between images. Make sure consecutive images overlap by at least 30%. If you’re passing images in random order, sort them spatially first.

Stitcher_ERR_HOMOGRAPHY_EST_FAIL — Not enough feature matches survived RANSAC. This happens with low-texture scenes (sky, water, white walls). Try using SIFT instead of ORB, or add more overlap between shots.

Visible seams in the panorama — Use multi-band blending instead of simple overwrite. Also check that exposure is consistent across input images. If shooting manually, lock the exposure before capturing.

Warped panorama looks curved or bent — This is expected for wide fields of view (over 120 degrees). Use cylindrical or spherical warping instead of planar: stitcher.setWarper(cv2.PyRotationWarper("cylindrical", focal_length)). Estimate focal length from EXIF data or calibrate the camera.

Out of memory with many large images — Downscale images before stitching. A 4000x3000 image uses ~36MB in memory as a BGR numpy array. Ten of those plus intermediate warped canvases can easily blow past available RAM. Work at 50% scale for feature detection and warping, then do a final full-resolution pass only for the output.