Files
dataset-yolo-script/README-sam2-cpu.md
2026-02-04 15:29:36 +07:00

5.3 KiB

SAM2 to YOLOv9t Pipeline

Automated video annotation pipeline using SAM2 (Segment Anything Model 2) to create YOLO format datasets for YOLOv9t training on Kaggle.

Overview

Video → SAM2 (auto-segment) → Bounding Boxes → YOLO Dataset → Train YOLOv9t

Features

  • SAM2 Auto-Annotation: Automatically segment any object in video frames
  • YOLO Format Export: Convert masks to YOLO bounding box format
  • Kaggle-Ready: All notebooks optimized for Kaggle GPU environment
  • YOLOv9t Training: Train efficient tiny YOLO model on custom data

Project Structure

sam2-yolo-pipeline/
├── notebooks/
│   ├── 01_sam2_video_annotation.ipynb  # SAM2 setup + video annotation
│   ├── 02_create_yolo_dataset.ipynb    # Convert to YOLO format
│   └── 03_train_yolov9t.ipynb          # Train YOLOv9t model
├── utils/
│   ├── __init__.py
│   ├── video_utils.py                  # Video processing utilities
│   ├── sam2_utils.py                   # SAM2 annotation utilities
│   └── yolo_utils.py                   # YOLO dataset utilities
└── README.md

Quick Start

py scripts/frigate_mini.py --config configs/frigate_mini_cpu_pt.yaml py scripts/frigate_mini.py --model models/krg_masuk_yolov9t_best.pt --video input/karung_masuk.mp4

On Kaggle

  1. Upload Video

    • Create a new Kaggle Dataset with your video file(s)
  2. Run Notebook 1: SAM2 Annotation

    • Upload 01_sam2_video_annotation.ipynb to Kaggle
    • Enable GPU (Settings → Accelerator → GPU)
    • Update VIDEO_PATH to your video
    • Run all cells
  3. Run Notebook 2: Create YOLO Dataset

    • Upload 02_create_yolo_dataset.ipynb
    • Point to annotations from step 2
    • Run all cells
    • Download yolo_dataset.zip or create Kaggle Dataset
  4. Run Notebook 3: Train YOLOv9t

    • Upload 03_train_yolov9t.ipynb
    • Enable GPU
    • Point to your YOLO dataset
    • Run training
    • Download trained weights

Configuration

SAM2 Model Variants

Model Size Speed Accuracy
tiny 39MB Fastest Good
small 46MB Fast Better
base_plus 81MB Medium High
large 224MB Slow Best

Frame Extraction Settings

SAMPLE_FPS = 2          # Frames per second to extract
MAX_FRAMES = 500        # Maximum frames (None for all)
MIN_MASK_AREA = 500     # Minimum object area in pixels

Training Settings

CONFIG = {
    'epochs': 100,
    'batch': 16,
    'imgsz': 640,
    'patience': 20,
    'lr0': 0.001,
}

YOLO Dataset Format

yolo_dataset/
├── data.yaml           # Dataset configuration
├── images/
│   ├── train/          # Training images
│   └── val/            # Validation images
└── labels/
    ├── train/          # Training labels (.txt)
    └── val/            # Validation labels (.txt)

Label Format (YOLO)

# class x_center y_center width height (normalized 0-1)
0 0.45 0.32 0.12 0.25
0 0.78 0.61 0.08 0.15

Requirements

Automatically installed in notebooks:

torch>=2.0
ultralytics
segment-anything-2
opencv-python
supervision
tqdm
pyyaml

Usage Examples

After Training

from ultralytics import YOLO

# Load trained model
model = YOLO('best.pt')

# Inference on image
results = model.predict('image.jpg', conf=0.25)

# Inference on video
results = model.predict('video.mp4', conf=0.25, save=True)

# Access detections
for result in results:
    for box in result.boxes:
        x1, y1, x2, y2 = box.xyxy[0].tolist()
        confidence = box.conf[0].item()
        class_id = int(box.cls[0].item())
        print(f"Class {class_id}: {confidence:.2f} at [{x1:.0f}, {y1:.0f}, {x2:.0f}, {y2:.0f}]")

Export Formats

# ONNX
model.export(format='onnx')

# TensorRT
model.export(format='engine')

# OpenVINO
model.export(format='openvino')

# CoreML
model.export(format='coreml')

Tips

Improve Annotation Quality

  • Adjust MIN_MASK_AREA to filter small/noisy detections
  • Use MAX_MASK_AREA to exclude large background regions
  • Lower SAMPLE_FPS if video frames are very similar
  • Use SAM2 large model for better segmentation accuracy

Improve Training Results

  • More diverse training data
  • Experiment with augmentation settings
  • Try different learning rates
  • Use larger image size (imgsz=1280)
  • Train for more epochs with patience

Kaggle GPU Tips

  • P100 GPU: ~16GB VRAM, use batch=16-32
  • T4 GPU: ~16GB VRAM, use batch=16-32
  • Enable mixed precision for faster training
  • Save checkpoints to avoid losing progress

Troubleshooting

CUDA Out of Memory

# Reduce batch size
CONFIG['batch'] = 8

# Or reduce image size
CONFIG['imgsz'] = 416

SAM2 Installation Issues

# Install from source
pip install git+https://github.com/facebookresearch/segment-anything-2.git

Missing Labels Warning

This is normal - some frames may have no detectable objects. The dataset is still valid.

License

MIT License - Feel free to use and modify for your projects.

Acknowledgments