How to Build License Plate Detection and Recognition with YOLO and OCR

The two-stage approach is the only reliable way to build license plate recognition. First, use YOLO to detect the plate region in the image. Second, crop that region and run OCR on it. Trying to OCR the whole image or using single-stage models gives terrible accuracy.

Here’s the complete pipeline with real Python code you can run today.

The Two-Stage Pipeline

Stage 1 is detection. YOLO finds the bounding box coordinates of license plates in your image or video frame. This isolates the region of interest and eliminates background noise that confuses OCR.

Stage 2 is recognition. EasyOCR or PaddleOCR reads the text from the cropped plate region. You’ll get better results by preprocessing the crop first—convert to grayscale, apply contrast enhancement, and resize to a consistent height.

Use YOLOv8 for detection. It’s fast enough for real-time processing and you can fine-tune it on license plate datasets like CCPD or OpenALPR. For OCR, EasyOCR works out of the box for most plate formats, but PaddleOCR is faster if you need to process video streams at high FPS.

Install Dependencies

1
pip install ultralytics opencv-python easyocr numpy

If you want PaddleOCR instead (faster for video), install it separately:

1
pip install paddlepaddle paddleocr

Detection with YOLO

Train YOLOv8 on a license plate dataset or use a pretrained model. Here’s how to detect plates and crop them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import cv2
import torch
from ultralytics import YOLO
import easyocr
import numpy as np

def detect_and_recognize_plates(image_path):
    # Load YOLO model (use your trained model or download pretrained)
    model = YOLO('yolov8n.pt')  # Replace with 'license_plate.pt' after training

    # Read image
    img = cv2.imread(image_path)

    # Run detection
    results = model(img, conf=0.5)

    # Initialize OCR reader (supports 80+ languages)
    reader = easyocr.Reader(['en'], gpu=torch.cuda.is_available())

    detected_plates = []

    for result in results:
        boxes = result.boxes.xyxy.cpu().numpy()
        confidences = result.boxes.conf.cpu().numpy()

        for box, conf in zip(boxes, confidences):
            x1, y1, x2, y2 = map(int, box)

            # Crop plate region
            plate_crop = img[y1:y2, x1:x2]

            # Preprocess for better OCR accuracy
            plate_gray = cv2.cvtColor(plate_crop, cv2.COLOR_BGR2GRAY)
            plate_enhanced = cv2.equalizeHist(plate_gray)

            # Resize to consistent height (improves OCR)
            scale_percent = 200
            width = int(plate_enhanced.shape[1] * scale_percent / 100)
            height = int(plate_enhanced.shape[0] * scale_percent / 100)
            plate_resized = cv2.resize(plate_enhanced, (width, height))

            # Run OCR
            ocr_results = reader.readtext(plate_resized, detail=0)

            if ocr_results:
                plate_text = ''.join(ocr_results).upper()
                # Clean up common OCR mistakes
                plate_text = plate_text.replace(' ', '').replace('-', '')

                detected_plates.append({
                    'text': plate_text,
                    'bbox': (x1, y1, x2, y2),
                    'confidence': float(conf)
                })

                # Draw bounding box and text
                cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
                cv2.putText(img, plate_text, (x1, y1 - 10),
                           cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

    return img, detected_plates

# Run on image
result_img, plates = detect_and_recognize_plates('parking_lot.jpg')
cv2.imwrite('output.jpg', result_img)

for plate in plates:
    print(f"Detected: {plate['text']} (confidence: {plate['confidence']:.2f})")

The preprocessing steps matter more than you’d think. Histogram equalization fixes uneven lighting, and resizing to 2x the original height gives OCR models more pixels to work with. Skip this and you’ll get garbage like “4BC 7O3” instead of “ABC 703”.

Real-Time Video Processing

For video streams, you need to balance accuracy and FPS. Process every 3rd frame to maintain 10+ FPS on a decent GPU:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
import cv2
from ultralytics import YOLO
import easyocr
from collections import defaultdict
import time

class LicensePlateTracker:
    def __init__(self, model_path='yolov8n.pt', process_every_n_frames=3):
        self.model = YOLO(model_path)
        self.reader = easyocr.Reader(['en'], gpu=True)
        self.process_every_n = process_every_n_frames
        self.frame_count = 0
        self.tracked_plates = defaultdict(list)

    def preprocess_plate(self, plate_crop):
        """Enhanced preprocessing for better OCR"""
        gray = cv2.cvtColor(plate_crop, cv2.COLOR_BGR2GRAY)

        # Apply bilateral filter to reduce noise while keeping edges
        denoised = cv2.bilateralFilter(gray, 11, 17, 17)

        # Adaptive thresholding works better than global threshold
        thresh = cv2.adaptiveThreshold(
            denoised, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
            cv2.THRESH_BINARY, 11, 2
        )

        # Resize for OCR
        height = 100  # Standard height for most OCR models
        aspect_ratio = plate_crop.shape[1] / plate_crop.shape[0]
        width = int(height * aspect_ratio)
        resized = cv2.resize(thresh, (width, height))

        return resized

    def clean_plate_text(self, text):
        """Remove common OCR artifacts"""
        # Remove spaces, dashes, and common misread characters
        cleaned = text.replace(' ', '').replace('-', '').replace('|', 'I')
        cleaned = cleaned.upper()

        # Filter to alphanumeric only
        cleaned = ''.join(c for c in cleaned if c.isalnum())

        return cleaned

    def process_frame(self, frame):
        self.frame_count += 1

        # Skip frames for performance
        if self.frame_count % self.process_every_n != 0:
            return frame, []

        results = self.model(frame, conf=0.4, verbose=False)
        detected_plates = []

        for result in results:
            boxes = result.boxes.xyxy.cpu().numpy()
            confidences = result.boxes.conf.cpu().numpy()

            for box, conf in zip(boxes, confidences):
                x1, y1, x2, y2 = map(int, box)

                # Crop and preprocess
                plate_crop = frame[y1:y2, x1:x2]

                if plate_crop.size == 0:
                    continue

                processed = self.preprocess_plate(plate_crop)

                # OCR
                ocr_result = self.reader.readtext(processed, detail=0)

                if ocr_result:
                    plate_text = self.clean_plate_text(''.join(ocr_result))

                    # Filter out obvious garbage (too short or too long)
                    if 4 <= len(plate_text) <= 10:
                        detected_plates.append({
                            'text': plate_text,
                            'bbox': (x1, y1, x2, y2),
                            'confidence': float(conf),
                            'timestamp': time.time()
                        })

                        # Draw on frame
                        cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
                        label = f"{plate_text} ({conf:.2f})"
                        cv2.putText(frame, label, (x1, y1 - 10),
                                   cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)

        return frame, detected_plates

# Run on video file or webcam
def process_video(video_path):
    tracker = LicensePlateTracker(process_every_n_frames=3)
    cap = cv2.VideoCapture(video_path)

    video_fps = cap.get(cv2.CAP_PROP_FPS)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    prev_time = time.time()

    # Optional: write output video
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter('output.mp4', fourcc, video_fps, (width, height))

    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break

        processed_frame, plates = tracker.process_frame(frame)

        # Display actual processing FPS
        curr_time = time.time()
        actual_fps = 1.0 / (curr_time - prev_time) if curr_time != prev_time else 0
        prev_time = curr_time
        fps_text = f"FPS: {actual_fps:.1f}"
        cv2.putText(processed_frame, fps_text, (10, 30),
                   cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)

        out.write(processed_frame)
        cv2.imshow('License Plate Recognition', processed_frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    out.release()
    cv2.destroyAllWindows()

# Run on video file
process_video('traffic.mp4')

# Or use webcam (0 for default camera)
# process_video(0)

The process_every_n_frames parameter is critical. Processing every frame kills your FPS and doesn’t improve accuracy—plates don’t change between consecutive frames. Process every 3rd frame and you’ll hit 15-20 FPS on a GTX 1660 or better.

Handling Different Plate Formats

Different countries use different plate formats. US plates are typically 2-3 letters followed by 3-4 numbers (e.g., ABC 1234). European plates start with country codes. Chinese plates have province characters.

You need format validation to filter out false positives. Here’s a regex-based validator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import re

def validate_plate_format(text, country='US'):
    """Validate plate text against common formats"""
    formats = {
        'US': r'^[A-Z]{2,3}\d{3,4}$',  # ABC1234
        'EU': r'^[A-Z]{1,3}\d{2,4}[A-Z]{1,3}$',  # AB12CD
        'CN': r'^[\u4e00-\u9fff][A-Z][A-Z0-9]{5}$',  # 京A12345
    }

    pattern = formats.get(country)
    if not pattern:
        return True  # No validation for unknown formats

    return bool(re.match(pattern, text))

# Use in OCR pipeline
plate_text = clean_plate_text(''.join(ocr_results))
if validate_plate_format(plate_text, country='US'):
    print(f"Valid plate: {plate_text}")
else:
    print(f"Invalid format: {plate_text}")

Add multiple format patterns and try them all. If a plate matches any valid format, keep it. This simple check eliminates 90% of OCR garbage.

Common Errors and Fixes

“RuntimeError: CUDA out of memory” Your GPU doesn’t have enough VRAM for the YOLO model and OCR together. Use gpu=False in easyocr.Reader(['en'], gpu=False) to run OCR on CPU, or switch to a smaller YOLO model like yolov8n.pt.

OCR returns empty results The plate crop is too small or too blurry. Check that your YOLO model is accurately detecting plates—visualize the bounding boxes. If crops are tiny (less than 50x20 pixels), increase the image resolution or retrain YOLO with smaller plate examples.

Confusing O/0, I/1, S/5 This is the classic OCR problem. Use post-processing rules based on plate format. For example, US plates usually alternate letters and numbers (ABC 123), so you can infer whether a character should be O or 0 based on position.

Low FPS on video You’re processing every frame or using CPU. Drop process_every_n_frames to 5 or higher, use a GPU, and switch from EasyOCR to PaddleOCR. PaddleOCR is 2-3x faster for real-time applications.

Plates not detected in low light YOLO struggles with underexposed images. Apply gamma correction to the input frame before detection:

1
2
3
4
5
6
7
8
9
def adjust_gamma(image, gamma=1.5):
    inv_gamma = 1.0 / gamma
    table = np.array([(i / 255.0) ** inv_gamma * 255
                     for i in range(256)]).astype("uint8")
    return cv2.LUT(image, table)

# Use before detection
frame = adjust_gamma(frame, gamma=1.5)
results = model(frame)

High false positive rate Lower the YOLO confidence threshold (conf=0.6 instead of 0.4) and add format validation. Most false positives are random text that doesn’t match plate patterns.

The Two-Stage Pipeline#

Install Dependencies#

Detection with YOLO#

Real-Time Video Processing#

Handling Different Plate Formats#

Common Errors and Fixes#

Related Guides#

About the Author