ariska/dataset-yolo-script

Fork 0

Files

ariska 5a951d8812 add sam2 yolo auto annotation

2026-02-04 15:29:36 +07:00

23 KiB

Raw Blame History

Feature Plan: YOLO-Assisted Auto-Annotation + Mini Frigate RKNN

Overview

Create two integrated components:

YOLO-Assisted Annotator - Use pretrained YOLOv9t to auto-annotate video frames
Frigate-Mini-RKNN - Standalone mini fork of Frigate for RKNN inference with MP4 input

Goals

Auto-annotate videos using YOLOv9t pretrained model (replaces manual SAM2 prompts)
Minimal Frigate fork with multiple detector backends:
- RKNN - Rockchip NPU acceleration (RK3588, RK3568, etc.)
- ONNX - CPU-only inference (cross-platform, no special hardware)
- YOLO - Ultralytics backend (CPU/CUDA)
MP4 file as camera feed source
Output: Clean snapshot + YOLO format label pairs
Simple text-based configuration
Debug mode with object list visualization

Detector Backends Comparison

Backend	Hardware	Performance	Platform	Use Case
RKNN	Rockchip NPU	Fast (30+ FPS)	ARM (RK3588/3568)	Production on Rockchip SBC
ONNX	CPU	Medium (5-15 FPS)	Any (x86/ARM)	Development, testing, no GPU
YOLO	CPU/CUDA	Fast with GPU	Any	Development, CUDA systems

Recommended Workflow

Development/Testing: Use ONNX backend on any CPU
Production on Rockchip: Convert to RKNN, deploy on NPU
Production on x86/CUDA: Use YOLO backend with GPU

Project Structure

sam2-yolo-pipeline/
├── notebooks/                      # Existing Kaggle notebooks
├── utils/                          # Existing utilities
├── yolo_annotator/                 # NEW: YOLO-assisted annotation
│   ├── __init__.py
│   ├── annotator.py               # Core YOLOv9t annotator
│   ├── video_source.py            # MP4/RTSP video source handler
│   ├── export.py                  # Snapshot + label export
│   └── visualizer.py              # Debug visualization
├── frigate_mini/                   # NEW: Mini Frigate fork
│   ├── __init__.py
│   ├── app.py                     # Main application entry
│   ├── config/
│   │   ├── __init__.py
│   │   ├── schema.py              # Config validation
│   │   └── loader.py              # YAML config loader
│   ├── detector/
│   │   ├── __init__.py
│   │   ├── base.py                # Base detector interface
│   │   ├── rknn_detector.py       # RKNN backend
│   │   ├── onnx_detector.py       # ONNX fallback
│   │   └── yolo_detector.py       # Ultralytics YOLO fallback
│   ├── video/
│   │   ├── __init__.py
│   │   ├── mp4_source.py          # MP4 file source
│   │   └── frame_processor.py     # Frame processing pipeline
│   ├── output/
│   │   ├── __init__.py
│   │   ├── snapshot.py            # Snapshot capture
│   │   └── annotation.py          # YOLO label writer
│   └── debug/
│       ├── __init__.py
│       ├── object_list.py         # Detected objects display
│       └── visualizer.py          # Bounding box overlay
├── configs/                        # NEW: Configuration files
│   ├── annotator.yaml             # Annotator settings
│   └── frigate_mini.yaml          # Frigate-mini settings
├── models/                         # NEW: Model weights storage
│   └── .gitkeep
├── output/                         # NEW: Default output directory
│   ├── snapshots/
│   ├── labels/
│   └── debug/
├── scripts/                        # NEW: CLI scripts
│   ├── annotate.py                # Run annotation pipeline
│   ├── frigate_mini.py            # Run mini frigate
│   └── convert_to_rknn.py         # Convert ONNX to RKNN
└── requirements.txt               # Updated dependencies

Component 1: YOLO-Assisted Annotator

Purpose

Replace SAM2 auto-annotation with faster YOLOv9t-based detection for creating training datasets.

Workflow

MP4 Video → Frame Extraction → YOLOv9t Detection → Filter/NMS → YOLO Labels + Snapshots

Features

Model Loading
- Load pretrained YOLOv9t (.pt file)
- Support custom trained models
- Configurable confidence threshold
- Configurable NMS threshold
Video Processing
- MP4 file input
- Configurable FPS sampling
- Frame skip / time range selection
- Resolution scaling
Detection Filtering
- Filter by class IDs
- Filter by confidence score
- Filter by bbox size (min/max area)
- Filter by aspect ratio
Output Generation
- Clean snapshot images (no annotations drawn)
- YOLO format label files (.txt)
- Optional debug images with boxes drawn
- JSON manifest of all detections

Configuration (annotator.yaml)

# YOLO-Assisted Annotator Configuration

model:
  path: "models/yolov9t.pt"          # Path to YOLO model
  device: "cuda"                      # cuda, cpu, or rknn
  conf_threshold: 0.25                # Confidence threshold
  iou_threshold: 0.45                 # NMS IoU threshold

video:
  source: "input/video.mp4"           # Video file path
  sample_fps: 2                       # Frames per second to extract
  max_frames: null                    # Max frames (null = all)
  start_time: 0                       # Start time in seconds
  end_time: null                      # End time (null = end of video)
  resize: null                        # [width, height] or null

detection:
  classes: null                       # Class IDs to keep (null = all)
  min_confidence: 0.3                 # Minimum confidence to save
  min_area: 100                       # Minimum bbox area in pixels
  max_area: null                      # Maximum bbox area (null = no limit)
  min_size: 0.01                      # Minimum bbox dimension (normalized)

output:
  directory: "output/annotations"     # Output directory
  save_snapshots: true                # Save clean images
  save_labels: true                   # Save YOLO labels
  save_debug: true                    # Save debug visualizations
  save_manifest: true                 # Save JSON manifest
  image_format: "jpg"                 # jpg or png
  image_quality: 95                   # JPEG quality (1-100)

classes:
  # Class name mapping (for display/filtering)
  0: "person"
  1: "bicycle"
  2: "car"
  # ... etc

Component 2: Frigate-Mini-RKNN

Purpose

Minimal standalone Frigate-like system for RKNN inference on Rockchip devices, outputting annotation pairs.

Workflow

MP4 Feed → Frame Decode → RKNN Inference → Object Tracking → Snapshot + Label Export

Features

Video Input
- MP4 file as "camera" source
- Loop playback option
- Configurable FPS limit
- Multiple video sources support
RKNN Detector
- Load RKNN model (.rknn file)
- NPU acceleration on Rockchip SoCs
- Fallback to ONNX/CPU if RKNN unavailable
- Batch inference support
Object Detection
- YOLOv9t architecture support
- Configurable input resolution
- Post-processing (NMS, filtering)
- Class filtering
Snapshot System
- Capture on detection trigger
- Configurable cooldown period
- Clean snapshots (no overlays)
- Crop to detected object (optional)
Annotation Export
- YOLO format labels
- Synchronized snapshot-label pairs
- Auto-naming with timestamps
- Dataset structure output
Debug Mode
- Real-time object list display
- Bounding box visualization
- FPS counter
- Detection statistics
- Save debug frames

Configuration (frigate_mini.yaml)

Option A: ONNX CPU-Only (Recommended for development/testing)

# Frigate-Mini Configuration - ONNX CPU Mode
# Works on any system without special hardware

debug: true
log_level: "info"

detector:
  type: "onnx"                        # Use ONNX Runtime
  model_path: "models/yolov9t.onnx"   # ONNX model file
  input_size: [640, 640]              # Model input resolution
  conf_threshold: 0.25                # Detection confidence
  nms_threshold: 0.45                 # NMS threshold
  
  # ONNX specific settings
  onnx:
    device: "cpu"                     # cpu or cuda
    num_threads: 4                    # CPU threads (0 = auto)
    optimization_level: "all"         # none, basic, extended, all

Option B: RKNN NPU (For Rockchip devices)

# Frigate-Mini Configuration - RKNN NPU Mode
# For Rockchip SBCs (RK3588, RK3568, etc.)

debug: true
log_level: "info"

detector:
  type: "rknn"                        # Use RKNN Runtime
  model_path: "models/yolov9t.rknn"   # RKNN model file
  input_size: [640, 640]              # Model input resolution
  conf_threshold: 0.25                # Detection confidence
  nms_threshold: 0.45                 # NMS threshold
  
  # RKNN specific
  rknn:
    target_platform: "rk3588"         # rk3588, rk3568, rk3566, etc.
    core_mask: 7                      # NPU core mask (7 = all 3 cores on RK3588)
  
  # Fallback to ONNX if RKNN fails
  fallback:
    enabled: true
    type: "onnx"
    device: "cpu"

Option C: Ultralytics YOLO (For CUDA systems)

# Frigate-Mini Configuration - Ultralytics YOLO Mode
# For systems with NVIDIA GPU

debug: true
log_level: "info"

detector:
  type: "yolo"                        # Use Ultralytics
  model_path: "models/yolov9t.pt"     # PyTorch model file
  conf_threshold: 0.25
  nms_threshold: 0.45
  
  # YOLO specific
  yolo:
    device: "cuda"                    # cpu, cuda, cuda:0, etc.
    half: true                        # FP16 inference (faster on GPU)

Full Configuration Example (with all options)

Video sources (cameras)

cameras: front_door: enabled: true source: "input/front_door.mp4" # MP4 file path fps: 5 # Processing FPS limit loop: true # Loop video playback

# Detection zones (optional)
detect:
  enabled: true
  width: 1280                     # Detection resolution
  height: 720
  
# Object filtering
objects:
  track:
    - person
    - car
    - dog
  filters:
    person:
      min_area: 1000              # Minimum area in pixels
      max_area: 500000
      min_score: 0.4

backyard: enabled: true source: "input/backyard.mp4" fps: 5 loop: true

Snapshot settings

snapshots: enabled: true output_dir: "output/snapshots"

Trigger settings

trigger: objects: # Objects that trigger snapshot - person - car min_score: 0.5 # Minimum score to trigger cooldown: 2.0 # Seconds between snapshots per object

Output settings

format: "jpg" # jpg or png quality: 95 # JPEG quality clean: true # No annotations on snapshot crop: false # Crop to object bbox retain_days: 7 # Days to keep snapshots

Annotation export

annotations: enabled: true output_dir: "output/labels" format: "yolo" # YOLO format

Pairing

pair_with_snapshots: true # Create snapshot-label pairs

Filtering

min_score: 0.3 classes: null # null = all classes

Debug settings

debug_output: enabled: true output_dir: "output/debug"

Object list display

object_list: enabled: true show_confidence: true show_class: true show_bbox: true

Visualization

visualization: enabled: true draw_boxes: true draw_labels: true draw_confidence: true box_thickness: 2 font_scale: 0.5

Statistics

stats: show_fps: true show_detection_count: true log_interval: 100 # Log stats every N frames

Class definitions

class_names: 0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus 6: train 7: truck 8: boat 9: traffic light 10: fire hydrant

... COCO classes continue


---

## Module Specifications

### 1. yolo_annotator/annotator.py

```python
class YOLOAnnotator:
    """YOLO-based automatic video annotator."""
    
    def __init__(self, config_path: str):
        """Load configuration and initialize model."""
        
    def load_model(self, model_path: str, device: str) -> None:
        """Load YOLOv9t model."""
        
    def process_video(self, video_path: str) -> AnnotationResult:
        """Process entire video and generate annotations."""
        
    def process_frame(self, frame: np.ndarray) -> List[Detection]:
        """Process single frame and return detections."""
        
    def filter_detections(self, detections: List[Detection]) -> List[Detection]:
        """Apply filtering rules to detections."""
        
    def export_annotations(self, output_dir: str) -> None:
        """Export all annotations to YOLO format."""

2. frigate_mini/detector/rknn_detector.py

class RKNNDetector:
    """RKNN-based YOLO detector for Rockchip NPU."""
    
    def __init__(self, model_path: str, target_platform: str):
        """Initialize RKNN runtime."""
        
    def load_model(self) -> bool:
        """Load RKNN model to NPU."""
        
    def preprocess(self, frame: np.ndarray) -> np.ndarray:
        """Preprocess frame for inference."""
        
    def inference(self, input_data: np.ndarray) -> np.ndarray:
        """Run inference on NPU."""
        
    def postprocess(self, outputs: np.ndarray) -> List[Detection]:
        """Parse YOLO outputs and apply NMS."""
        
    def detect(self, frame: np.ndarray) -> List[Detection]:
        """Full detection pipeline."""
        
    def release(self) -> None:
        """Release RKNN resources."""

3. frigate_mini/output/annotation.py

class AnnotationWriter:
    """Write YOLO format annotation files."""
    
    def __init__(self, output_dir: str, class_names: Dict[int, str]):
        """Initialize annotation writer."""
        
    def write_label(self, 
                    image_name: str,
                    detections: List[Detection],
                    image_size: Tuple[int, int]) -> str:
        """Write YOLO label file for image."""
        
    def detection_to_yolo(self,
                          detection: Detection,
                          image_width: int,
                          image_height: int) -> str:
        """Convert detection to YOLO format string."""
        
    def create_dataset_structure(self) -> None:
        """Create YOLO dataset directory structure."""
        
    def write_data_yaml(self, train_path: str, val_path: str) -> str:
        """Generate data.yaml for training."""

4. frigate_mini/debug/object_list.py

class ObjectListDisplay:
    """Display detected objects in debug mode."""
    
    def __init__(self, config: Dict):
        """Initialize display settings."""
        
    def update(self, detections: List[Detection]) -> None:
        """Update object list with new detections."""
        
    def format_detection(self, detection: Detection) -> str:
        """Format single detection for display."""
        
    def print_list(self) -> None:
        """Print current object list to console."""
        
    def save_snapshot_with_labels(self,
                                   frame: np.ndarray,
                                   detections: List[Detection],
                                   output_path: str) -> None:
        """Save debug image with annotations."""

Data Structures

Detection

@dataclass
class Detection:
    class_id: int           # Class index
    class_name: str         # Class name
    confidence: float       # Detection confidence (0-1)
    bbox: BBox              # Bounding box
    track_id: Optional[int] # Tracking ID (if tracked)
    timestamp: float        # Frame timestamp
    frame_id: int           # Frame number
    
@dataclass
class BBox:
    x1: float               # Top-left x (pixels)
    y1: float               # Top-left y (pixels)
    x2: float               # Bottom-right x (pixels)
    y2: float               # Bottom-right y (pixels)
    
    def to_yolo(self, img_w: int, img_h: int) -> Tuple[float, float, float, float]:
        """Convert to YOLO format (x_center, y_center, width, height) normalized."""
        
    def area(self) -> float:
        """Calculate bbox area in pixels."""

AnnotationPair

@dataclass
class AnnotationPair:
    image_path: str          # Path to snapshot image
    label_path: str          # Path to YOLO label file
    detections: List[Detection]
    timestamp: datetime
    camera_name: str
    frame_id: int

Output Format

Directory Structure

output/
├── snapshots/
│   ├── front_door/
│   │   ├── 20240115_143022_001.jpg
│   │   ├── 20240115_143025_002.jpg
│   │   └── ...
│   └── backyard/
│       └── ...
├── labels/
│   ├── front_door/
│   │   ├── 20240115_143022_001.txt
│   │   ├── 20240115_143025_002.txt
│   │   └── ...
│   └── backyard/
│       └── ...
├── debug/
│   ├── front_door/
│   │   ├── 20240115_143022_001_debug.jpg
│   │   └── ...
│   └── object_log.txt
└── manifest.json

YOLO Label Format

# {class_id} {x_center} {y_center} {width} {height}
0 0.456789 0.321456 0.123456 0.234567
2 0.789012 0.654321 0.098765 0.176543

Manifest JSON

{
  "created": "2024-01-15T14:30:22",
  "model": "yolov9t.rknn",
  "total_frames": 1500,
  "total_detections": 3420,
  "pairs": [
    {
      "image": "snapshots/front_door/20240115_143022_001.jpg",
      "label": "labels/front_door/20240115_143022_001.txt",
      "camera": "front_door",
      "frame_id": 150,
      "timestamp": "2024-01-15T14:30:22.500",
      "detections": [
        {"class": "person", "confidence": 0.87},
        {"class": "car", "confidence": 0.92}
      ]
    }
  ]
}

Implementation Phases

Phase 1: Core YOLO Annotator (Week 1)

Create yolo_annotator/ module structure
Implement YOLOAnnotator class with Ultralytics backend
Implement video source handling
Implement YOLO label export
Create annotator.yaml config loader
Add CLI script scripts/annotate.py
Test with sample video

Phase 2: Frigate-Mini Base (Week 2)

Create frigate_mini/ module structure
Implement config schema and loader
Implement base detector interface
Implement ONNX detector (for testing)
Implement MP4 video source
Implement basic frame processing loop
Test basic detection pipeline

Phase 3: RKNN Integration (Week 3)

Implement RKNN detector backend
Create ONNX to RKNN conversion script
Test on Rockchip hardware (RK3588/RK3568)
Optimize for NPU performance
Add fallback mechanism

Phase 4: Snapshot & Annotation System (Week 4)

Implement snapshot capture system
Implement annotation writer
Implement snapshot-label pairing
Add trigger-based capture logic
Create manifest generator

Phase 5: Debug System (Week 5)

Implement object list display
Implement debug visualization
Add statistics tracking
Create debug frame saver
Add console and file logging

Phase 6: Integration & Testing (Week 6)

Integration testing
Performance optimization
Documentation
Example configs for common use cases
Package for distribution

Dependencies

New Requirements

# requirements.txt additions

# YOLO
ultralytics>=8.0.0

# RKNN (install separately based on platform)
# rknn-toolkit2  # For conversion (x86)
# rknnlite2      # For inference (ARM)

# Video processing
opencv-python>=4.8.0
av>=10.0.0                  # PyAV for efficient video decoding

# Configuration
pyyaml>=6.0
pydantic>=2.0               # Config validation

# Utilities
tqdm>=4.65.0
numpy>=1.24.0

RKNN Installation Notes

# On x86 host (for model conversion):
pip install rknn-toolkit2

# On Rockchip device (for inference):
pip install rknnlite2

# Or install from Rockchip GitHub releases

Usage Examples

1. CPU-Only Workflow (ONNX) - Recommended for Development

# Step 1: Download pretrained YOLOv9t
wget https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9t.pt -O models/yolov9t.pt

# Step 2: Convert to ONNX
python scripts/convert_to_onnx.py --input models/yolov9t.pt --output models/yolov9t.onnx

# Step 3a: Auto-annotate video (CPU)
python scripts/annotate.py --config configs/annotator_cpu.yaml
# Or with CLI args:
python scripts/annotate.py \
    --model models/yolov9t.onnx \
    --video input/video.mp4 \
    --device cpu

# Step 3b: Run Frigate-Mini (CPU)
python scripts/frigate_mini.py --config configs/frigate_mini_cpu.yaml
# Or with CLI args:
python scripts/frigate_mini.py \
    --model models/yolov9t.onnx \
    --video input/video.mp4 \
    --output output/ \
    --debug

2. RKNN Workflow (Rockchip NPU)

# Step 1: Convert ONNX to RKNN (on x86 host)
python scripts/convert_to_rknn.py \
    --input models/yolov9t.onnx \
    --output models/yolov9t.rknn \
    --platform rk3588

# Step 2: Copy to Rockchip device and run
python scripts/frigate_mini.py --config configs/frigate_mini.yaml
# Or:
python scripts/frigate_mini.py \
    --model models/yolov9t.rknn \
    --video input/video.mp4 \
    --platform rk3588

3. GPU Workflow (CUDA)

# Using Ultralytics directly with GPU
python scripts/annotate.py \
    --model models/yolov9t.pt \
    --video input/video.mp4 \
    --device cuda

Quick Reference

Task	CPU (ONNX)	RKNN (NPU)	GPU (CUDA)
Model file	`.onnx`	`.rknn`	`.pt`
Config	`*_cpu.yaml`	`frigate_mini.yaml`	Use `--device cuda`
Speed	5-15 FPS	30+ FPS	50+ FPS
Hardware	Any CPU	Rockchip SBC	NVIDIA GPU

Future Enhancements

RTSP Support - Add real camera stream input
Object Tracking - Add ByteTrack/BoT-SORT for consistent IDs
Web UI - Simple web interface for monitoring
Multi-model - Support different models per camera
Event System - Webhooks for detection events
Auto-labeling Refinement - Use SAM2 to refine YOLO boxes
Active Learning - Flag low-confidence detections for review

23 KiB Raw Blame History

Feature Plan: YOLO-Assisted Auto-Annotation + Mini Frigate RKNN

Overview

Goals

Detector Backends Comparison

Recommended Workflow

Project Structure

Component 1: YOLO-Assisted Annotator

Purpose

Workflow

Features

Configuration (annotator.yaml)

Component 2: Frigate-Mini-RKNN

Purpose

Workflow

Features

Configuration (frigate_mini.yaml)

Option A: ONNX CPU-Only (Recommended for development/testing)

Option B: RKNN NPU (For Rockchip devices)

Option C: Ultralytics YOLO (For CUDA systems)

Full Configuration Example (with all options)

Video sources (cameras)

Snapshot settings

Trigger settings

Output settings

Annotation export

Pairing

Filtering

Debug settings

Object list display

Visualization

Statistics

Class definitions

... COCO classes continue

2. frigate_mini/detector/rknn_detector.py

3. frigate_mini/output/annotation.py

4. frigate_mini/debug/object_list.py

Data Structures

Detection

AnnotationPair

Output Format

Directory Structure

YOLO Label Format

Manifest JSON

Implementation Phases

Phase 1: Core YOLO Annotator (Week 1)

Phase 2: Frigate-Mini Base (Week 2)

Phase 3: RKNN Integration (Week 3)

Phase 4: Snapshot & Annotation System (Week 4)

Phase 5: Debug System (Week 5)

Phase 6: Integration & Testing (Week 6)

Dependencies

New Requirements

RKNN Installation Notes

Usage Examples

1. CPU-Only Workflow (ONNX) - Recommended for Development

2. RKNN Workflow (Rockchip NPU)

3. GPU Workflow (CUDA)

Quick Reference

Future Enhancements

References

23 KiB

Raw Blame History