OpenCV ships with a high-level cv2.Stitcher class that handles panorama creation in about five lines. It also gives you every building block to do it manually — feature detection, matching, homography estimation, warping, and blending. Knowing both paths matters. The high-level API is great until it fails silently on tricky inputs, and then you need to understand what’s happening underneath.
Here’s the fastest way to stitch two or more images:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| import cv2
import glob
image_paths = sorted(glob.glob("images/pano_*.jpg"))
images = [cv2.imread(p) for p in image_paths]
stitcher = cv2.Stitcher.create(mode=cv2.Stitcher_PANORAMA)
status, panorama = stitcher.stitch(images)
if status == cv2.Stitcher_OK:
cv2.imwrite("panorama_output.jpg", panorama)
print(f"Panorama saved: {panorama.shape[1]}x{panorama.shape[0]} pixels")
else:
error_map = {
cv2.Stitcher_ERR_NEED_MORE_IMGS: "Need more images",
cv2.Stitcher_ERR_HOMOGRAPHY_EST_FAIL: "Homography estimation failed",
cv2.Stitcher_ERR_CAMERA_PARAMS_ADJUST_FAIL: "Camera params adjustment failed",
}
print(f"Stitching failed: {error_map.get(status, f'Unknown error {status}')}")
|
The mode parameter matters. cv2.Stitcher_PANORAMA assumes the camera rotates around a fixed point (like standing and turning). cv2.Stitcher_SCANS works for flat document scanning where images are taken from roughly the same plane.
Manual Feature-Based Stitching Pipeline#
The high-level API hides a lot. When you need control over feature detection, matching thresholds, or blending strategy, build the pipeline yourself. This example stitches two images step by step.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
| import cv2
import numpy as np
def stitch_pair(img1, img2, feature_method="orb"):
"""Stitch two images using feature matching and homography."""
gray1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
gray2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
# Feature detection — ORB is free, SIFT needs opencv-contrib
if feature_method == "sift":
detector = cv2.SIFT_create(nfeatures=2000)
norm_type = cv2.NORM_L2
else:
detector = cv2.ORB_create(nfeatures=3000)
norm_type = cv2.NORM_HAMMING
kp1, des1 = detector.detectAndCompute(gray1, None)
kp2, des2 = detector.detectAndCompute(gray2, None)
if des1 is None or des2 is None:
raise ValueError("No descriptors found in one of the images")
# Match features with BFMatcher + ratio test
bf = cv2.BFMatcher(norm_type, crossCheck=False)
raw_matches = bf.knnMatch(des1, des2, k=2)
# Lowe's ratio test — 0.75 is a solid default
good_matches = []
for m, n in raw_matches:
if m.distance < 0.75 * n.distance:
good_matches.append(m)
print(f"Features: {len(kp1)} / {len(kp2)} | Good matches: {len(good_matches)}")
if len(good_matches) < 10:
raise ValueError(f"Only {len(good_matches)} matches — need at least 10 for a reliable homography")
# Extract matched point coordinates
src_pts = np.float32([kp1[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)
# Estimate homography with RANSAC
H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, ransacReprojThreshold=5.0)
inliers = mask.ravel().sum()
print(f"RANSAC inliers: {inliers} / {len(good_matches)}")
# Warp img1 into img2's coordinate frame
h1, w1 = img1.shape[:2]
h2, w2 = img2.shape[:2]
# Calculate output canvas size
corners_img1 = np.float32([[0, 0], [w1, 0], [w1, h1], [0, h1]]).reshape(-1, 1, 2)
warped_corners = cv2.perspectiveTransform(corners_img1, H)
all_corners = np.concatenate([warped_corners, np.float32([[0, 0], [w2, 0], [w2, h2], [0, h2]]).reshape(-1, 1, 2)])
x_min, y_min = np.int32(all_corners.min(axis=0).ravel())
x_max, y_max = np.int32(all_corners.max(axis=0).ravel())
# Translation matrix to shift everything into positive coordinates
translation = np.array([[1, 0, -x_min], [0, 1, -y_min], [0, 0, 1]], dtype=np.float64)
canvas_size = (x_max - x_min, y_max - y_min)
warped1 = cv2.warpPerspective(img1, translation @ H, canvas_size)
# Place img2 onto the canvas
warped1[-y_min:-y_min + h2, -x_min:-x_min + w2] = img2
return warped1
# Usage
img1 = cv2.imread("left.jpg")
img2 = cv2.imread("right.jpg")
result = stitch_pair(img1, img2, feature_method="orb")
cv2.imwrite("manual_panorama.jpg", result)
|
A few things worth noting: ORB is fast and patent-free, but SIFT produces more stable matches on images with significant scale changes. The RANSAC reprojection threshold of 5.0 pixels is a reasonable starting point — lower values are stricter, higher values tolerate more geometric distortion.
FLANN-Based Matching for Larger Image Sets#
When you’re matching thousands of features, BFMatcher gets slow. FLANN (Fast Library for Approximate Nearest Neighbors) is significantly faster for large descriptor sets:
1
2
3
4
5
6
7
8
9
10
11
12
13
| def flann_match(des1, des2, feature_method="sift"):
"""Match descriptors using FLANN for speed on large feature sets."""
if feature_method == "sift":
index_params = dict(algorithm=1, trees=5) # FLANN_INDEX_KDTREE
else:
index_params = dict(algorithm=6, table_number=6, key_size=12, multi_probe_level=1) # FLANN_INDEX_LSH for binary descriptors
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)
good = [m for m, n in matches if m.distance < 0.75 * n.distance]
return good
|
Use FLANN with SIFT descriptors (float vectors) for best results. For ORB’s binary descriptors, the LSH index works but BFMatcher is often just as fast.
Handling Exposure Differences with Multi-Band Blending#
The naive approach of just overwriting pixels in the overlap region creates a visible seam. Multi-band blending fixes this by blending low-frequency content (broad lighting) gradually while keeping high-frequency content (edges, texture) sharp.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
| def multi_band_blend(img1_warped, img2_warped, mask1, mask2, num_bands=5):
"""Blend two warped images using Laplacian pyramid multi-band blending."""
# Build Gaussian pyramids for masks
gp_mask1 = [mask1.astype(np.float32) / 255.0]
gp_mask2 = [mask2.astype(np.float32) / 255.0]
for i in range(num_bands):
gp_mask1.append(cv2.pyrDown(gp_mask1[i]))
gp_mask2.append(cv2.pyrDown(gp_mask2[i]))
# Build Laplacian pyramids for images
def laplacian_pyramid(img, levels):
gp = [img.astype(np.float64)]
for i in range(levels):
gp.append(cv2.pyrDown(gp[i]))
lp = [gp[levels]]
for i in range(levels, 0, -1):
expanded = cv2.pyrUp(gp[i], dstsize=(gp[i - 1].shape[1], gp[i - 1].shape[0]))
lp.append(gp[i - 1] - expanded)
return lp[::-1]
lp1 = laplacian_pyramid(img1_warped, num_bands)
lp2 = laplacian_pyramid(img2_warped, num_bands)
# Blend each pyramid level
blended_pyramid = []
for l1, l2, m1, m2 in zip(lp1, lp2, gp_mask1, gp_mask2):
m1_3ch = np.stack([m1] * 3, axis=-1) if m1.ndim == 2 else m1
m2_3ch = np.stack([m2] * 3, axis=-1) if m2.ndim == 2 else m2
weight_sum = m1_3ch + m2_3ch + 1e-8
blended = (l1 * m1_3ch + l2 * m2_3ch) / weight_sum
blended_pyramid.append(blended)
# Reconstruct from blended pyramid
result = blended_pyramid[0]
for i in range(1, len(blended_pyramid)):
result = cv2.pyrUp(result, dstsize=(blended_pyramid[i].shape[1], blended_pyramid[i].shape[0]))
result += blended_pyramid[i]
return np.clip(result, 0, 255).astype(np.uint8)
|
Five bands is a good default. More bands give smoother blending but cost more memory. For drone imagery or outdoor shots where lighting varies a lot across frames, multi-band blending makes a night-and-day difference compared to simple averaging.
Processing Video Frames into Panoramas#
Stitching panoramas from video is useful for surveillance, drone footage, and creating wide-angle views from a panning camera. The trick is selecting frames with enough overlap (30-50%) without too much redundancy.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
| import cv2
def extract_panorama_frames(video_path, frame_interval=15, max_frames=20):
"""Extract evenly-spaced frames from video for stitching."""
cap = cv2.VideoCapture(video_path)
if not cap.isOpened():
raise FileNotFoundError(f"Cannot open video: {video_path}")
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = cap.get(cv2.CAP_PROP_FPS)
print(f"Video: {total_frames} frames, {fps:.1f} FPS, {total_frames / fps:.1f}s duration")
frames = []
frame_idx = 0
while cap.isOpened() and len(frames) < max_frames:
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
ret, frame = cap.read()
if not ret:
break
frames.append(frame)
frame_idx += frame_interval
cap.release()
print(f"Extracted {len(frames)} frames")
return frames
def stitch_video_panorama(video_path, frame_interval=15):
"""Build a panorama from video frames."""
frames = extract_panorama_frames(video_path, frame_interval)
# Downscale for speed if frames are large
scale = 1.0
if frames[0].shape[1] > 1920:
scale = 1920 / frames[0].shape[1]
frames = [cv2.resize(f, None, fx=scale, fy=scale) for f in frames]
stitcher = cv2.Stitcher.create(cv2.Stitcher_PANORAMA)
stitcher.setPanoConfidenceThresh(0.8) # Lower threshold = more tolerant
status, panorama = stitcher.stitch(frames)
if status == cv2.Stitcher_OK:
return panorama
else:
# Try with fewer frames — sometimes too many overlapping frames confuse the stitcher
print(f"Failed with {len(frames)} frames, retrying with every other frame")
status, panorama = stitcher.stitch(frames[::2])
if status == cv2.Stitcher_OK:
return panorama
raise RuntimeError(f"Stitching failed with status {status}")
panorama = stitch_video_panorama("drone_pan.mp4", frame_interval=20)
cv2.imwrite("video_panorama.jpg", panorama)
|
The frame_interval parameter depends on camera speed. For a slow pan, every 15th-20th frame works. For fast motion, grab every 5th-8th frame. If the stitcher fails, try fewer frames first — too many near-duplicate frames actually hurt more than help.
Stitching gets expensive fast when you’re working with 10+ high-resolution images. Here are concrete ways to speed things up:
Downscale for feature detection, full-res for final warp. Detect and match features on images scaled to 25-50% of original resolution. Compute the homography on downscaled coordinates, then scale the translation components to apply to full-resolution images.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
| def fast_homography(img1, img2, work_scale=0.5):
"""Compute homography on downscaled images for speed."""
small1 = cv2.resize(img1, None, fx=work_scale, fy=work_scale)
small2 = cv2.resize(img2, None, fx=work_scale, fy=work_scale)
detector = cv2.ORB_create(nfeatures=2000)
kp1, des1 = detector.detectAndCompute(cv2.cvtColor(small1, cv2.COLOR_BGR2GRAY), None)
kp2, des2 = detector.detectAndCompute(cv2.cvtColor(small2, cv2.COLOR_BGR2GRAY), None)
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=False)
matches = bf.knnMatch(des1, des2, k=2)
good = [m for m, n in matches if m.distance < 0.75 * n.distance]
src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)
H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
# Scale homography to full resolution
S = np.diag([1.0 / work_scale, 1.0 / work_scale, 1.0])
S_inv = np.diag([work_scale, work_scale, 1.0])
H_full = S @ H @ S_inv
return H_full
|
Other things that matter:
- Limit ORB features to 2000-3000. More features means more matching time with diminishing returns on accuracy.
- Use
cv2.UMat for GPU-accelerated operations if OpenCV was built with OpenCL support. Wrap images with cv2.UMat(img) and the same API calls run on the GPU. - Stitch incrementally. Don’t try to match all pairs at once. Stitch pairs into sub-panoramas, then stitch those together. This reduces accumulated warping distortion and memory usage.
- Pre-crop black borders after each stitching step to avoid growing canvas sizes full of empty pixels.
Common Errors and Fixes#
cv2.error: (-215:Assertion failed) !_src.empty() — One of your images failed to load. Check that the file paths are correct and the images are valid. cv2.imread returns None silently on failure, so always verify: assert img is not None, f"Failed to load {path}".
Stitcher_ERR_NEED_MORE_IMGS — The stitcher couldn’t find enough overlap between images. Make sure consecutive images overlap by at least 30%. If you’re passing images in random order, sort them spatially first.
Stitcher_ERR_HOMOGRAPHY_EST_FAIL — Not enough feature matches survived RANSAC. This happens with low-texture scenes (sky, water, white walls). Try using SIFT instead of ORB, or add more overlap between shots.
Visible seams in the panorama — Use multi-band blending instead of simple overwrite. Also check that exposure is consistent across input images. If shooting manually, lock the exposure before capturing.
Warped panorama looks curved or bent — This is expected for wide fields of view (over 120 degrees). Use cylindrical or spherical warping instead of planar: stitcher.setWarper(cv2.PyRotationWarper("cylindrical", focal_length)). Estimate focal length from EXIF data or calibrate the camera.
Out of memory with many large images — Downscale images before stitching. A 4000x3000 image uses ~36MB in memory as a BGR numpy array. Ten of those plus intermediate warped canvases can easily blow past available RAM. Work at 50% scale for feature detection and warping, then do a final full-resolution pass only for the output.