Shot boundary detection is the backbone of any automated video editing pipeline. You need it for generating highlight reels, building video search indexes, creating chapter markers, or just chopping a long recording into manageable clips. PySceneDetect handles this well – it ships a CLI for quick one-liners and a Python API for when you need programmatic control.
Here’s the fastest way to split a video into scenes from your terminal:
| |
That detects cuts, prints a scene list, and splits the file into separate clips. Three commands chained together, done in seconds. But the real power is in the Python API.
Quick Start with the CLI
The scenedetect CLI follows a pipeline pattern: you specify an input, pick a detection algorithm, then chain output commands. Each command has its own flags.
Detect scenes and save a CSV report:
| |
This creates an interview-Scenes.csv file with timecodes for every detected cut. The --threshold flag controls sensitivity – lower values catch more subtle changes, higher values only trigger on hard cuts.
Use adaptive detection for footage with lots of camera motion:
| |
detect-adaptive uses a rolling average of frame differences instead of a fixed threshold, which cuts down on false positives when the camera is panning or tracking a subject.
Split video into separate files without re-encoding (fast, lossless):
| |
The --copy flag tells ffmpeg to stream-copy rather than transcode. This is nearly instant but cut points may be slightly off since it snaps to the nearest keyframe.
You can also limit detection to a time range:
| |
Programmatic Scene Detection with Python
The CLI is great for quick jobs, but you’ll want the Python API when building pipelines. PySceneDetect gives you two ways to detect scenes: the simple detect() function and the full SceneManager class.
The simplest approach:
| |
detect() returns a list of (start_timecode, end_timecode) tuples. Each timecode is a FrameTimecode object that gives you frame numbers, seconds, or formatted timecodes.
For more control, use SceneManager directly:
| |
min_scene_len=15 means at least 15 frames must pass between detected cuts. This prevents rapid-fire false detections in noisy footage. At 30fps, that’s a half-second minimum scene length.
For detecting fade-ins and fade-outs instead of hard cuts, use ThresholdDetector:
| |
ThresholdDetector monitors average pixel intensity. When a frame drops below the threshold (fade to black) and then rises back above it, that triggers a scene boundary. Works well for presentations, lectures, and anything with deliberate fade transitions.
Extracting Scene Thumbnails and Metadata
Once you have a scene list, you’ll often want thumbnails for each scene and a structured export for downstream processing. PySceneDetect has built-in functions for both.
| |
save_images returns a dictionary mapping scene numbers to file paths. The num_images parameter controls how many frames get extracted per scene – 3 gives you the first frame, a middle frame, and the last frame. Set encoder_param to control JPEG quality (0-100, default 95).
The CSV output includes start/end timecodes, frame numbers, and duration for every scene. You can feed this directly into a video editor’s EDL importer or parse it in a downstream pipeline.
To split the video into separate files programmatically:
| |
This calls ffmpeg under the hood. Make sure ffmpeg is on your PATH.
Fine-Tuning Detection Parameters
Picking the right detector and threshold depends on your footage. Here’s a practical breakdown.
ContentDetector compares HSV color histograms between consecutive frames. It works best on footage with distinct hard cuts – interviews, vlogs, edited content.
| |
AdaptiveDetector uses a rolling average of frame-to-frame differences. The threshold is relative to the local average, not absolute. This makes it much better for footage with camera movement, varying lighting, or handheld shots.
| |
window_width controls how many frames contribute to the rolling average. A wider window smooths out more noise but reacts slower to real cuts. min_content_val sets a floor – any frame-to-frame difference below this value is ignored entirely, even if it’s above the adaptive threshold.
A quick comparison to help you choose:
| Scenario | Detector | Threshold |
|---|---|---|
| Edited videos, hard cuts | ContentDetector | 25-30 |
| Camera motion, documentaries | AdaptiveDetector | 2.5-3.5 |
| Presentations with fades | ThresholdDetector | 10-15 |
| Music videos, fast edits | ContentDetector | 35-45 |
| Surveillance footage | AdaptiveDetector | 4.0-5.0 |
When in doubt, start with AdaptiveDetector at its default settings. It handles the widest range of footage without tuning.
Common Errors and Fixes
ffmpeg not found when splitting video
| |
PySceneDetect uses ffmpeg for split-video and split_video_ffmpeg(). Install it:
| |
Scene detection itself only needs OpenCV, not ffmpeg. So detect() and list-scenes work without it.
Video codec not supported
| |
OpenCV can’t decode the video. Usually happens with HEVC/H.265 or AV1 files. Fix by installing OpenCV with full codec support or transcode the file first:
| |
Memory issues with large files
PySceneDetect processes frames sequentially, so memory usage stays flat regardless of video length. But if you’re also loading all scene images into memory, that adds up fast. Process scenes in batches:
| |
Scenes too short or too many false detections
Increase min_scene_len or raise the threshold. A min_scene_len of 30 at 30fps means no scene can be shorter than 1 second. Combine this with a higher threshold to filter out noise:
| |
Related Guides
- How to Build a Document Comparison Pipeline with Vision Models
- How to Build a Lane Detection Pipeline with OpenCV and YOLO
- How to Build a Real-Time Pose Estimation Pipeline with MediaPipe
- How to Build a Vehicle Counting Pipeline with YOLOv8 and OpenCV
- How to Build Video Analytics Pipelines with OpenCV and Deep Learning
- How to Build a Scene Text Recognition Pipeline with PaddleOCR
- How to Build a Video Surveillance Analytics Pipeline with YOLOv8
- How to Build a Receipt Scanner with OCR and Structured Extraction
- How to Build Hand Gesture Recognition with MediaPipe and Python
- How to Build an OCR Pipeline with PaddleOCR and Tesseract