You snap a photo of a receipt on a table. The angle is off, there’s a shadow in the corner, and the edges are skewed. A document scanner fixes all of that — it finds the document boundary, corrects the perspective, and gives you a flat, top-down view. The whole thing runs in about 30 lines of OpenCV code.
Install the dependencies first:
| |
Preprocess the Image
The pipeline starts by converting to grayscale, blurring to reduce noise, and running Canny edge detection. Canny picks up the strong edges around the document boundary while ignoring texture inside the page.
| |
The two threshold values in cv2.Canny(blurred, 50, 150) control sensitivity. The first (50) is the low threshold — edges below this gradient magnitude are discarded. The second (150) is the high threshold — edges above this are kept immediately. Anything between the two is kept only if it connects to a strong edge. If your document edges aren’t showing up, lower both values. If you’re getting too much noise, raise them.
GaussianBlur with a (5, 5) kernel smooths out small texture details that would otherwise create false edges. Without it, Canny picks up text lines, paper grain, and table wood patterns.
Find the Document Contour
Once you have edges, find contours and look for the largest four-sided one. A document in a photo is almost always the biggest quadrilateral in the frame.
| |
cv2.approxPolyDP simplifies the contour into fewer points. The epsilon value 0.02 * peri controls how aggressively it simplifies — 2% of the perimeter works well for documents. If you’re not getting a four-point match, try bumping it to 0.03 * peri or 0.04 * peri.
cv2.RETR_EXTERNAL tells OpenCV to only return outermost contours, skipping any nested shapes inside the document. cv2.CHAIN_APPROX_SIMPLE compresses straight-line segments into just their endpoints, which saves memory and speeds up processing.
We only check the top 5 contours by area. The document should be one of the largest shapes in the frame. Checking all contours wastes time on tiny noise blobs.
Order Corner Points and Apply Perspective Transform
The four points from approxPolyDP come in an arbitrary order. You need them in a consistent order — top-left, top-right, bottom-right, bottom-left — before computing the transform.
| |
The logic is simple. The top-left corner has the smallest sum of coordinates (closest to origin). The bottom-right has the largest sum. For the other two, the difference between y and x disambiguates them.
Now compute the output dimensions and apply the warp:
| |
cv2.getPerspectiveTransform takes two sets of four points — the source quadrilateral and the destination rectangle — and computes a 3x3 transformation matrix. cv2.warpPerspective applies that matrix to remap every pixel. We use the original (unprocessed) image here so the output preserves full color and detail.
The output dimensions are computed from the actual edge lengths of the detected quadrilateral. This keeps the aspect ratio close to the real document instead of forcing a fixed size.
Clean Up with Adaptive Thresholding
For a crisp black-and-white scan (like a photocopier), apply adaptive thresholding to the warped result:
| |
The block size (11) sets the neighborhood for computing the local threshold. The constant (2) is subtracted from the mean — it controls how aggressive the binarization is. Smaller constants keep more detail, larger constants produce cleaner whites. For documents with heavy shadows, bump the block size to 21 or 31.
This step is optional. If you want a color scan, skip it and use the output from warpPerspective directly.
Full Pipeline
Here’s everything combined into a single reusable function:
| |
Common Errors and Fixes
ValueError: No document contour found
The Canny edge detector didn’t pick up a clean four-sided boundary. Common causes:
- Low contrast between document and background. Place the document on a dark surface.
- Thresholds too high. Try
cv2.Canny(blurred, 30, 100)or evencv2.Canny(blurred, 20, 80). - Document is partially outside the frame. The full boundary must be visible.
- Try dilating edges before contour detection to close small gaps:
edges = cv2.dilate(edges, None, iterations=1).
cv2.error: (-215:Assertion failed) src.checkVector(2, CV_32F) == 4
This means getPerspectiveTransform didn’t receive exactly 4 points as float32. Make sure your ordered points array has shape (4, 2) and dtype float32:
| |
Output image is black or all white
- Black output: the warp mapped to coordinates outside the image. Check that your corner points are in the correct order.
- White output after adaptive thresholding: the block size is too small or the constant is too high. Try
cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 21, 5).
ModuleNotFoundError: No module named 'cv2'
Install the headless version if you don’t need GUI windows:
| |
Warped output is rotated or mirrored
The corner ordering is wrong. Print the ordered points and verify top-left actually has the smallest x+y sum. If your document is upside down in the photo, the sum/diff heuristic can assign corners incorrectly. Rotate the input image first with cv2.rotate(image, cv2.ROTATE_180).
Related Guides
- How to Build a Barcode and QR Code Scanner with OpenCV and ZBar
- How to Build a Panoramic Image Stitching Pipeline with OpenCV
- How to Build a Document Table Extraction Pipeline with Vision Models
- How to Build a Color Detection and Extraction Pipeline with OpenCV
- How to Build a Document Comparison Pipeline with Vision Models
- How to Build a Lane Detection Pipeline with OpenCV and YOLO
- How to Build a Vehicle Counting Pipeline with YOLOv8 and OpenCV
- How to Build Video Analytics Pipelines with OpenCV and Deep Learning
- How to Build a Receipt Scanner with OCR and Structured Extraction
- How to Build a Product Defect Detector with YOLOv8 and OpenCV