5.3 KiB
5.3 KiB
SAM2 to YOLOv9t Pipeline
Automated video annotation pipeline using SAM2 (Segment Anything Model 2) to create YOLO format datasets for YOLOv9t training on Kaggle.
Overview
Video → SAM2 (auto-segment) → Bounding Boxes → YOLO Dataset → Train YOLOv9t
Features
- SAM2 Auto-Annotation: Automatically segment any object in video frames
- YOLO Format Export: Convert masks to YOLO bounding box format
- Kaggle-Ready: All notebooks optimized for Kaggle GPU environment
- YOLOv9t Training: Train efficient tiny YOLO model on custom data
Project Structure
sam2-yolo-pipeline/
├── notebooks/
│ ├── 01_sam2_video_annotation.ipynb # SAM2 setup + video annotation
│ ├── 02_create_yolo_dataset.ipynb # Convert to YOLO format
│ └── 03_train_yolov9t.ipynb # Train YOLOv9t model
├── utils/
│ ├── __init__.py
│ ├── video_utils.py # Video processing utilities
│ ├── sam2_utils.py # SAM2 annotation utilities
│ └── yolo_utils.py # YOLO dataset utilities
└── README.md
Quick Start
py scripts/frigate_mini.py --config configs/frigate_mini_cpu_pt.yaml py scripts/frigate_mini.py --model models/krg_masuk_yolov9t_best.pt --video input/karung_masuk.mp4
On Kaggle
-
Upload Video
- Create a new Kaggle Dataset with your video file(s)
-
Run Notebook 1: SAM2 Annotation
- Upload
01_sam2_video_annotation.ipynbto Kaggle - Enable GPU (Settings → Accelerator → GPU)
- Update
VIDEO_PATHto your video - Run all cells
- Upload
-
Run Notebook 2: Create YOLO Dataset
- Upload
02_create_yolo_dataset.ipynb - Point to annotations from step 2
- Run all cells
- Download
yolo_dataset.zipor create Kaggle Dataset
- Upload
-
Run Notebook 3: Train YOLOv9t
- Upload
03_train_yolov9t.ipynb - Enable GPU
- Point to your YOLO dataset
- Run training
- Download trained weights
- Upload
Configuration
SAM2 Model Variants
| Model | Size | Speed | Accuracy |
|---|---|---|---|
tiny |
39MB | Fastest | Good |
small |
46MB | Fast | Better |
base_plus |
81MB | Medium | High |
large |
224MB | Slow | Best |
Frame Extraction Settings
SAMPLE_FPS = 2 # Frames per second to extract
MAX_FRAMES = 500 # Maximum frames (None for all)
MIN_MASK_AREA = 500 # Minimum object area in pixels
Training Settings
CONFIG = {
'epochs': 100,
'batch': 16,
'imgsz': 640,
'patience': 20,
'lr0': 0.001,
}
YOLO Dataset Format
yolo_dataset/
├── data.yaml # Dataset configuration
├── images/
│ ├── train/ # Training images
│ └── val/ # Validation images
└── labels/
├── train/ # Training labels (.txt)
└── val/ # Validation labels (.txt)
Label Format (YOLO)
# class x_center y_center width height (normalized 0-1)
0 0.45 0.32 0.12 0.25
0 0.78 0.61 0.08 0.15
Requirements
Automatically installed in notebooks:
torch>=2.0
ultralytics
segment-anything-2
opencv-python
supervision
tqdm
pyyaml
Usage Examples
After Training
from ultralytics import YOLO
# Load trained model
model = YOLO('best.pt')
# Inference on image
results = model.predict('image.jpg', conf=0.25)
# Inference on video
results = model.predict('video.mp4', conf=0.25, save=True)
# Access detections
for result in results:
for box in result.boxes:
x1, y1, x2, y2 = box.xyxy[0].tolist()
confidence = box.conf[0].item()
class_id = int(box.cls[0].item())
print(f"Class {class_id}: {confidence:.2f} at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")
Export Formats
# ONNX
model.export(format='onnx')
# TensorRT
model.export(format='engine')
# OpenVINO
model.export(format='openvino')
# CoreML
model.export(format='coreml')
Tips
Improve Annotation Quality
- Adjust
MIN_MASK_AREAto filter small/noisy detections - Use
MAX_MASK_AREAto exclude large background regions - Lower
SAMPLE_FPSif video frames are very similar - Use SAM2
largemodel for better segmentation accuracy
Improve Training Results
- More diverse training data
- Experiment with augmentation settings
- Try different learning rates
- Use larger image size (imgsz=1280)
- Train for more epochs with patience
Kaggle GPU Tips
- P100 GPU: ~16GB VRAM, use batch=16-32
- T4 GPU: ~16GB VRAM, use batch=16-32
- Enable mixed precision for faster training
- Save checkpoints to avoid losing progress
Troubleshooting
CUDA Out of Memory
# Reduce batch size
CONFIG['batch'] = 8
# Or reduce image size
CONFIG['imgsz'] = 416
SAM2 Installation Issues
# Install from source
pip install git+https://github.com/facebookresearch/segment-anything-2.git
Missing Labels Warning
This is normal - some frames may have no detectable objects. The dataset is still valid.
License
MIT License - Feel free to use and modify for your projects.
Acknowledgments
- SAM2 by Meta AI
- YOLOv9 by WongKinYiu
- Ultralytics for YOLO implementation