Files

ariska 5a951d8812 add sam2 yolo auto annotation

2026-02-04 15:29:36 +07:00

5.3 KiB

Raw Permalink Blame History

SAM2 to YOLOv9t Pipeline

Automated video annotation pipeline using SAM2 (Segment Anything Model 2) to create YOLO format datasets for YOLOv9t training on Kaggle.

Overview

Video → SAM2 (auto-segment) → Bounding Boxes → YOLO Dataset → Train YOLOv9t

Features

SAM2 Auto-Annotation: Automatically segment any object in video frames
YOLO Format Export: Convert masks to YOLO bounding box format
Kaggle-Ready: All notebooks optimized for Kaggle GPU environment
YOLOv9t Training: Train efficient tiny YOLO model on custom data

Project Structure

sam2-yolo-pipeline/
├── notebooks/
│   ├── 01_sam2_video_annotation.ipynb  # SAM2 setup + video annotation
│   ├── 02_create_yolo_dataset.ipynb    # Convert to YOLO format
│   └── 03_train_yolov9t.ipynb          # Train YOLOv9t model
├── utils/
│   ├── __init__.py
│   ├── video_utils.py                  # Video processing utilities
│   ├── sam2_utils.py                   # SAM2 annotation utilities
│   └── yolo_utils.py                   # YOLO dataset utilities
└── README.md

Quick Start

py scripts/frigate_mini.py --config configs/frigate_mini_cpu_pt.yaml py scripts/frigate_mini.py --model models/krg_masuk_yolov9t_best.pt --video input/karung_masuk.mp4

On Kaggle

Upload Video
- Create a new Kaggle Dataset with your video file(s)
Run Notebook 1: SAM2 Annotation
- Upload 01_sam2_video_annotation.ipynb to Kaggle
- Enable GPU (Settings → Accelerator → GPU)
- Update VIDEO_PATH to your video
- Run all cells
Run Notebook 2: Create YOLO Dataset
- Upload 02_create_yolo_dataset.ipynb
- Point to annotations from step 2
- Run all cells
- Download yolo_dataset.zip or create Kaggle Dataset
Run Notebook 3: Train YOLOv9t
- Upload 03_train_yolov9t.ipynb
- Enable GPU
- Point to your YOLO dataset
- Run training
- Download trained weights

Configuration

SAM2 Model Variants

Model	Size	Speed	Accuracy
`tiny`	39MB	Fastest	Good
`small`	46MB	Fast	Better
`base_plus`	81MB	Medium	High
`large`	224MB	Slow	Best

Frame Extraction Settings

SAMPLE_FPS = 2          # Frames per second to extract
MAX_FRAMES = 500        # Maximum frames (None for all)
MIN_MASK_AREA = 500     # Minimum object area in pixels

Training Settings

CONFIG = {
    'epochs': 100,
    'batch': 16,
    'imgsz': 640,
    'patience': 20,
    'lr0': 0.001,
}

YOLO Dataset Format

yolo_dataset/
├── data.yaml           # Dataset configuration
├── images/
│   ├── train/          # Training images
│   └── val/            # Validation images
└── labels/
    ├── train/          # Training labels (.txt)
    └── val/            # Validation labels (.txt)

Label Format (YOLO)

# class x_center y_center width height (normalized 0-1)
0 0.45 0.32 0.12 0.25
0 0.78 0.61 0.08 0.15

Requirements

Automatically installed in notebooks:

torch>=2.0
ultralytics
segment-anything-2
opencv-python
supervision
tqdm
pyyaml

Usage Examples

After Training

from ultralytics import YOLO

# Load trained model
model = YOLO('best.pt')

# Inference on image
results = model.predict('image.jpg', conf=0.25)

# Inference on video
results = model.predict('video.mp4', conf=0.25, save=True)

# Access detections
for result in results:
    for box in result.boxes:
        x1, y1, x2, y2 = box.xyxy[0].tolist()
        confidence = box.conf[0].item()
        class_id = int(box.cls[0].item())
        print(f"Class {class_id}: {confidence:.2f} at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")

Export Formats

# ONNX
model.export(format='onnx')

# TensorRT
model.export(format='engine')

# OpenVINO
model.export(format='openvino')

# CoreML
model.export(format='coreml')

Tips

Improve Annotation Quality

Adjust MIN_MASK_AREA to filter small/noisy detections
Use MAX_MASK_AREA to exclude large background regions
Lower SAMPLE_FPS if video frames are very similar
Use SAM2 large model for better segmentation accuracy

Improve Training Results

More diverse training data
Experiment with augmentation settings
Try different learning rates
Use larger image size (imgsz=1280)
Train for more epochs with patience

Kaggle GPU Tips

P100 GPU: ~16GB VRAM, use batch=16-32
T4 GPU: ~16GB VRAM, use batch=16-32
Enable mixed precision for faster training
Save checkpoints to avoid losing progress

Troubleshooting

CUDA Out of Memory

# Reduce batch size
CONFIG['batch'] = 8

# Or reduce image size
CONFIG['imgsz'] = 416

SAM2 Installation Issues

# Install from source
pip install git+https://github.com/facebookresearch/segment-anything-2.git

Missing Labels Warning

This is normal - some frames may have no detectable objects. The dataset is still valid.

License

MIT License - Feel free to use and modify for your projects.

Acknowledgments

SAM2 by Meta AI
YOLOv9 by WongKinYiu
Ultralytics for YOLO implementation

5.3 KiB Raw Permalink Blame History