How to Train a Custom Object Detection Model with Ultralytics

The Quick Version

Install Ultralytics, organize your images and labels in YOLO format, point a data.yaml at them, and run one command. Here’s the entire training loop in under 10 lines:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from ultralytics import YOLO

# Start from a pretrained model — transfer learning is almost always the right call
model = YOLO("yolo11n.pt")

# Train on your custom dataset
results = model.train(
    data="safety_equipment/data.yaml",
    epochs=100,
    imgsz=640,
    batch=16,
    name="safety_detector_v1",
)

That’s it. Ultralytics handles augmentation, learning rate scheduling, mixed precision, and checkpointing automatically. Your best weights land in runs/detect/safety_detector_v1/weights/best.pt.

Preparing Your Dataset

YOLO format is dead simple. Each image gets a matching .txt file with one bounding box per line. The format is:

1
class_id center_x center_y width height

All coordinates are normalized to [0, 1] relative to image dimensions. A bounding box for class 0 (say, “hardhat”) centered at 60% across and 30% down the image, spanning 20% width and 15% height looks like:

1
0 0.60 0.30 0.20 0.15

Directory Structure

Organize your data like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
safety_equipment/
├── data.yaml
├── train/
│   ├── images/
│   │   ├── img_001.jpg
│   │   ├── img_002.jpg
│   └── labels/
│       ├── img_001.txt
│       ├── img_002.txt
├── val/
│   ├── images/
│   └── labels/
└── test/
    ├── images/
    └── labels/

The key rule: label filenames must match image filenames exactly (minus the extension). img_001.jpg pairs with img_001.txt. Get this wrong and Ultralytics silently treats images as having no annotations — you’ll see training “work” but metrics stay near zero.

The data.yaml File

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
path: /absolute/path/to/safety_equipment
train: train/images
val: val/images
test: test/images

names:
  0: hardhat
  1: vest
  2: gloves
  3: goggles
  4: no_hardhat

Use absolute paths for path to avoid headaches. The names dict maps class IDs to human-readable labels. Order matters — it must match the class IDs in your .txt label files.

Choosing a Model Size

Ultralytics offers five sizes for both YOLOv8 and YOLOv11:

Model	Params	Speed (T4 GPU)	mAP (COCO)	Use Case
`yolo11n`	2.6M	~1.5ms	39.5	Edge devices, real-time
`yolo11s`	9.4M	~2.5ms	47.0	Balanced speed/accuracy
`yolo11m`	20.1M	~4.7ms	51.5	General production
`yolo11l`	25.3M	~6.2ms	53.4	High accuracy needed
`yolo11x`	56.9M	~11.3ms	54.7	Maximum accuracy

My recommendation: start with yolo11s or yolo11m. The nano model loses too much accuracy for most real-world tasks, and the large/xlarge models rarely justify the extra compute unless you have a genuinely hard detection problem with small or overlapping objects.

Training with the CLI

If you prefer the command line, the yolo CLI works great:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
yolo detect train \
    data=safety_equipment/data.yaml \
    model=yolo11m.pt \
    epochs=100 \
    imgsz=640 \
    batch=16 \
    lr0=0.01 \
    lrf=0.001 \
    augment=True \
    device=0 \
    workers=8 \
    name=safety_detector_v1

This is functionally identical to the Python API. Every argument you pass here maps to a parameter in model.train().

Hyperparameter Tuning

Don’t guess hyperparameters — use Ultralytics’ built-in tuner. It runs a genetic algorithm over your hyperparameter space:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from ultralytics import YOLO

model = YOLO("yolo11m.pt")

# Run 30 iterations of hyperparameter search
model.tune(
    data="safety_equipment/data.yaml",
    epochs=30,
    iterations=30,
    optimizer="AdamW",
    plots=True,
    save=True,
)

The tuner adjusts learning rate, momentum, augmentation intensity, and more. It’s expensive — budget 30x your normal training time — but it regularly finds 2-5% mAP improvements that manual tuning misses.

Parameters Worth Tweaking Manually

A few knobs have outsized impact:

imgsz: Bigger images catch smaller objects. Try 1280 if you’re missing detections on small targets.
batch: As large as your GPU allows. Larger batches give more stable gradients.
lr0: Starting learning rate. Lower it (0.001) if training is unstable. Raise it (0.02) with large batch sizes.
mosaic: Mosaic augmentation (default 1.0). Set to 0.0 for the last 10 epochs — this is a known trick that stabilizes final metrics.
close_mosaic: Number of final epochs without mosaic augmentation. Default is 10, which is usually fine.

Evaluating Your Model

After training, check your metrics:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from ultralytics import YOLO

model = YOLO("runs/detect/safety_detector_v1/weights/best.pt")

# Evaluate on validation set
metrics = model.val(data="safety_equipment/data.yaml")

print(f"mAP50:    {metrics.box.map50:.4f}")
print(f"mAP50-95: {metrics.box.map:.4f}")
print(f"Precision: {metrics.box.mp:.4f}")
print(f"Recall:    {metrics.box.mr:.4f}")

What the numbers mean:

mAP50: Mean average precision at IoU 0.50. Your primary metric for most applications. Aim for 0.85+ on well-labeled data.
mAP50-95: Averaged across IoU thresholds from 0.50 to 0.95. Much stricter — 0.55+ is solid.
Precision: Of all detections, how many were correct. High precision = few false positives.
Recall: Of all ground truth objects, how many did the model find. High recall = few missed objects.

If precision is high but recall is low, your model is conservative — lower the confidence threshold. If recall is high but precision is low, you’re getting too many false positives — raise the threshold or add more negative examples to training.

Exporting Your Model

Training produces a PyTorch .pt file. For production, export to a runtime-optimized format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from ultralytics import YOLO

model = YOLO("runs/detect/safety_detector_v1/weights/best.pt")

# ONNX — universal format, works everywhere
model.export(format="onnx", opset=17, simplify=True, dynamic=True)

# TensorRT — fastest inference on NVIDIA GPUs
model.export(format="engine", half=True, device=0)

# CoreML — Apple devices (iOS, macOS)
model.export(format="coreml", nms=True)

# NCNN — lightweight, great for Android
model.export(format="ncnn")

For server-side inference, TensorRT is the clear winner — expect 2-4x speedup over PyTorch on the same GPU. For edge and mobile, ONNX Runtime or CoreML depending on your target hardware. The dynamic=True flag on ONNX export lets you vary batch size and image dimensions at inference time, which is critical for production flexibility.

Common Errors

`No labels found` during training

Your label .txt files aren’t where Ultralytics expects them. Double-check that labels/ is a sibling to images/ and filenames match exactly. Also verify data.yaml paths are correct — use absolute paths.

Training loss goes to NaN

Usually caused by a learning rate that’s too high or corrupted label files. Check for label lines with coordinates outside [0, 1] or negative values. Run this to validate:

1
2
3
4
5
6
7
8
9
# Find label files with out-of-range values
python3 -c "
import pathlib
for f in pathlib.Path('safety_equipment/train/labels').glob('*.txt'):
    for i, line in enumerate(f.read_text().splitlines()):
        vals = list(map(float, line.split()))
        if any(v < 0 or v > 1 for v in vals[1:]):
            print(f'{f.name}:{i} -> {line}')
"

Low mAP despite good training loss

This almost always means label quality issues. Common causes: inconsistent labeling (sometimes you label an object, sometimes you don’t), bounding boxes that are too tight or too loose, or class confusion between similar categories. Review a sample of your annotations in a tool like Label Studio or CVAT before blaming the model.

CUDA out of memory

Reduce batch size or imgsz. If you’re on a 12GB GPU, batch=8 with imgsz=640 using yolo11m is a safe starting point. You can also enable gradient accumulation by setting batch=16 with accumulate=2 — this simulates a batch size of 32 using only 16 images of GPU memory per step.

Export fails with `opset version` error

Bump the opset parameter. ONNX opset 17 works for most YOLO models. If you’re targeting an older inference runtime, try opset 13 or 14, but some operations may not be supported.

Tips from Production

A few things I’ve learned shipping YOLO models to production:

Start with fewer classes. A 3-class model will outperform a 20-class model on the same data budget. Merge similar classes early, split them later when you have enough examples.
500 images per class is the minimum. Below that, you’re fighting data scarcity. Above 2000 per class, you’ll see diminishing returns unless the class has high visual variance.
Freeze the backbone for small datasets. If you have under 1000 images total, freeze the first 10 layers with freeze=10. This prevents overfitting the feature extractor and typically adds 3-5 mAP points on small datasets.
Always validate on a separate set. Never train and validate on the same images. Sounds obvious, but I’ve seen teams accidentally leak validation images into training through augmentation pipelines.
Version your datasets. When you retrain with more data, keep the old dataset around. If the new model regresses on a specific class, you need to compare annotations.

The Quick Version#

Preparing Your Dataset#

Directory Structure#

The data.yaml File#

Choosing a Model Size#

Training with the CLI#

Hyperparameter Tuning#

Parameters Worth Tweaking Manually#

Evaluating Your Model#

Exporting Your Model#

Common Errors#

No labels found during training#

Training loss goes to NaN#

Low mAP despite good training loss#

CUDA out of memory#

Export fails with opset version error#

Tips from Production#

Related Guides#

About the Author