How to Build Document Scanner with OpenCV and Perspective Transform

You snap a photo of a receipt on a table. The angle is off, there’s a shadow in the corner, and the edges are skewed. A document scanner fixes all of that — it finds the document boundary, corrects the perspective, and gives you a flat, top-down view. The whole thing runs in about 30 lines of OpenCV code.

Install the dependencies first:

1
pip install opencv-python numpy

Preprocess the Image

The pipeline starts by converting to grayscale, blurring to reduce noise, and running Canny edge detection. Canny picks up the strong edges around the document boundary while ignoring texture inside the page.

1
2
3
4
5
6
7
8
9
import cv2
import numpy as np

image = cv2.imread("document_photo.jpg")
original = image.copy()

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edges = cv2.Canny(blurred, 50, 150)

The two threshold values in cv2.Canny(blurred, 50, 150) control sensitivity. The first (50) is the low threshold — edges below this gradient magnitude are discarded. The second (150) is the high threshold — edges above this are kept immediately. Anything between the two is kept only if it connects to a strong edge. If your document edges aren’t showing up, lower both values. If you’re getting too much noise, raise them.

GaussianBlur with a (5, 5) kernel smooths out small texture details that would otherwise create false edges. Without it, Canny picks up text lines, paper grain, and table wood patterns.

Find the Document Contour

Once you have edges, find contours and look for the largest four-sided one. A document in a photo is almost always the biggest quadrilateral in the frame.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = sorted(contours, key=cv2.contourArea, reverse=True)

doc_contour = None
for contour in contours[:5]:
    peri = cv2.arcLength(contour, True)
    approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
    if len(approx) == 4:
        doc_contour = approx
        break

if doc_contour is None:
    raise ValueError("Could not find a four-sided contour. Try adjusting Canny thresholds.")

cv2.approxPolyDP simplifies the contour into fewer points. The epsilon value 0.02 * peri controls how aggressively it simplifies — 2% of the perimeter works well for documents. If you’re not getting a four-point match, try bumping it to 0.03 * peri or 0.04 * peri.

cv2.RETR_EXTERNAL tells OpenCV to only return outermost contours, skipping any nested shapes inside the document. cv2.CHAIN_APPROX_SIMPLE compresses straight-line segments into just their endpoints, which saves memory and speeds up processing.

We only check the top 5 contours by area. The document should be one of the largest shapes in the frame. Checking all contours wastes time on tiny noise blobs.

Order Corner Points and Apply Perspective Transform

The four points from approxPolyDP come in an arbitrary order. You need them in a consistent order — top-left, top-right, bottom-right, bottom-left — before computing the transform.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def order_points(pts):
    pts = pts.reshape(4, 2)
    ordered = np.zeros((4, 2), dtype="float32")

    s = pts.sum(axis=1)
    ordered[0] = pts[np.argmin(s)]   # top-left has smallest x+y
    ordered[2] = pts[np.argmax(s)]   # bottom-right has largest x+y

    d = np.diff(pts, axis=1)
    ordered[1] = pts[np.argmin(d)]   # top-right has smallest y-x
    ordered[3] = pts[np.argmax(d)]   # bottom-left has largest y-x

    return ordered

The logic is simple. The top-left corner has the smallest sum of coordinates (closest to origin). The bottom-right has the largest sum. For the other two, the difference between y and x disambiguates them.

Now compute the output dimensions and apply the warp:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
pts = order_points(doc_contour)
(tl, tr, br, bl) = pts

width_top = np.linalg.norm(tr - tl)
width_bottom = np.linalg.norm(br - bl)
max_width = int(max(width_top, width_bottom))

height_left = np.linalg.norm(bl - tl)
height_right = np.linalg.norm(br - tr)
max_height = int(max(height_left, height_right))

dst = np.array([
    [0, 0],
    [max_width - 1, 0],
    [max_width - 1, max_height - 1],
    [0, max_height - 1]
], dtype="float32")

matrix = cv2.getPerspectiveTransform(pts, dst)
scanned = cv2.warpPerspective(original, matrix, (max_width, max_height))

cv2.imwrite("scanned_output.jpg", scanned)

cv2.getPerspectiveTransform takes two sets of four points — the source quadrilateral and the destination rectangle — and computes a 3x3 transformation matrix. cv2.warpPerspective applies that matrix to remap every pixel. We use the original (unprocessed) image here so the output preserves full color and detail.

The output dimensions are computed from the actual edge lengths of the detected quadrilateral. This keeps the aspect ratio close to the real document instead of forcing a fixed size.

Clean Up with Adaptive Thresholding

For a crisp black-and-white scan (like a photocopier), apply adaptive thresholding to the warped result:

1
2
3
4
5
6
7
8
scanned_gray = cv2.cvtColor(scanned, cv2.COLOR_BGR2GRAY)
scanned_bw = cv2.adaptiveThreshold(
    scanned_gray, 255,
    cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
    cv2.THRESH_BINARY, 11, 2
)

cv2.imwrite("scanned_bw.jpg", scanned_bw)

The block size (11) sets the neighborhood for computing the local threshold. The constant (2) is subtracted from the mean — it controls how aggressive the binarization is. Smaller constants keep more detail, larger constants produce cleaner whites. For documents with heavy shadows, bump the block size to 21 or 31.

This step is optional. If you want a color scan, skip it and use the output from warpPerspective directly.

Full Pipeline

Here’s everything combined into a single reusable function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import cv2
import numpy as np


def order_points(pts):
    pts = pts.reshape(4, 2)
    ordered = np.zeros((4, 2), dtype="float32")
    s = pts.sum(axis=1)
    ordered[0] = pts[np.argmin(s)]
    ordered[2] = pts[np.argmax(s)]
    d = np.diff(pts, axis=1)
    ordered[1] = pts[np.argmin(d)]
    ordered[3] = pts[np.argmax(d)]
    return ordered


def scan_document(image_path, output_path="scanned.jpg", bw=False):
    image = cv2.imread(image_path)
    if image is None:
        raise FileNotFoundError(f"Cannot read image: {image_path}")

    original = image.copy()

    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blurred = cv2.GaussianBlur(gray, (5, 5), 0)
    edges = cv2.Canny(blurred, 50, 150)

    contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = sorted(contours, key=cv2.contourArea, reverse=True)

    doc_contour = None
    for contour in contours[:5]:
        peri = cv2.arcLength(contour, True)
        approx = cv2.approxPolyDP(contour, 0.02 * peri, True)
        if len(approx) == 4:
            doc_contour = approx
            break

    if doc_contour is None:
        raise ValueError("No document contour found. Adjust Canny thresholds or check the image.")

    pts = order_points(doc_contour)
    (tl, tr, br, bl) = pts

    max_width = int(max(np.linalg.norm(tr - tl), np.linalg.norm(br - bl)))
    max_height = int(max(np.linalg.norm(bl - tl), np.linalg.norm(br - tr)))

    dst = np.array([
        [0, 0],
        [max_width - 1, 0],
        [max_width - 1, max_height - 1],
        [0, max_height - 1]
    ], dtype="float32")

    matrix = cv2.getPerspectiveTransform(pts, dst)
    scanned = cv2.warpPerspective(original, matrix, (max_width, max_height))

    if bw:
        scanned = cv2.cvtColor(scanned, cv2.COLOR_BGR2GRAY)
        scanned = cv2.adaptiveThreshold(
            scanned, 255,
            cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
            cv2.THRESH_BINARY, 11, 2
        )

    cv2.imwrite(output_path, scanned)
    print(f"Saved scanned document to {output_path} ({max_width}x{max_height})")
    return scanned


# Color scan
scan_document("receipt.jpg", "receipt_scanned.jpg")

# Black-and-white scan
scan_document("receipt.jpg", "receipt_bw.jpg", bw=True)

Common Errors and Fixes

ValueError: No document contour found

The Canny edge detector didn’t pick up a clean four-sided boundary. Common causes:

Low contrast between document and background. Place the document on a dark surface.
Thresholds too high. Try cv2.Canny(blurred, 30, 100) or even cv2.Canny(blurred, 20, 80).
Document is partially outside the frame. The full boundary must be visible.
Try dilating edges before contour detection to close small gaps: edges = cv2.dilate(edges, None, iterations=1).

cv2.error: (-215:Assertion failed) src.checkVector(2, CV_32F) == 4

This means getPerspectiveTransform didn’t receive exactly 4 points as float32. Make sure your ordered points array has shape (4, 2) and dtype float32:

1
pts = order_points(doc_contour).astype("float32")

Output image is black or all white

Black output: the warp mapped to coordinates outside the image. Check that your corner points are in the correct order.
White output after adaptive thresholding: the block size is too small or the constant is too high. Try cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 21, 5).

ModuleNotFoundError: No module named 'cv2'

Install the headless version if you don’t need GUI windows:

1
pip install opencv-python-headless

Warped output is rotated or mirrored

The corner ordering is wrong. Print the ordered points and verify top-left actually has the smallest x+y sum. If your document is upside down in the photo, the sum/diff heuristic can assign corners incorrectly. Rotate the input image first with cv2.rotate(image, cv2.ROTATE_180).

Preprocess the Image#

Find the Document Contour#

Order Corner Points and Apply Perspective Transform#

Clean Up with Adaptive Thresholding#

Full Pipeline#

Common Errors and Fixes#

Related Guides#

About the Author