23 KiB
Feature Plan: YOLO-Assisted Auto-Annotation + Mini Frigate RKNN
Overview
Create two integrated components:
- YOLO-Assisted Annotator - Use pretrained YOLOv9t to auto-annotate video frames
- Frigate-Mini-RKNN - Standalone mini fork of Frigate for RKNN inference with MP4 input
Goals
- Auto-annotate videos using YOLOv9t pretrained model (replaces manual SAM2 prompts)
- Minimal Frigate fork with multiple detector backends:
- RKNN - Rockchip NPU acceleration (RK3588, RK3568, etc.)
- ONNX - CPU-only inference (cross-platform, no special hardware)
- YOLO - Ultralytics backend (CPU/CUDA)
- MP4 file as camera feed source
- Output: Clean snapshot + YOLO format label pairs
- Simple text-based configuration
- Debug mode with object list visualization
Detector Backends Comparison
| Backend | Hardware | Performance | Platform | Use Case |
|---|---|---|---|---|
| RKNN | Rockchip NPU | Fast (30+ FPS) | ARM (RK3588/3568) | Production on Rockchip SBC |
| ONNX | CPU | Medium (5-15 FPS) | Any (x86/ARM) | Development, testing, no GPU |
| YOLO | CPU/CUDA | Fast with GPU | Any | Development, CUDA systems |
Recommended Workflow
- Development/Testing: Use ONNX backend on any CPU
- Production on Rockchip: Convert to RKNN, deploy on NPU
- Production on x86/CUDA: Use YOLO backend with GPU
Project Structure
sam2-yolo-pipeline/
├── notebooks/ # Existing Kaggle notebooks
├── utils/ # Existing utilities
├── yolo_annotator/ # NEW: YOLO-assisted annotation
│ ├── __init__.py
│ ├── annotator.py # Core YOLOv9t annotator
│ ├── video_source.py # MP4/RTSP video source handler
│ ├── export.py # Snapshot + label export
│ └── visualizer.py # Debug visualization
├── frigate_mini/ # NEW: Mini Frigate fork
│ ├── __init__.py
│ ├── app.py # Main application entry
│ ├── config/
│ │ ├── __init__.py
│ │ ├── schema.py # Config validation
│ │ └── loader.py # YAML config loader
│ ├── detector/
│ │ ├── __init__.py
│ │ ├── base.py # Base detector interface
│ │ ├── rknn_detector.py # RKNN backend
│ │ ├── onnx_detector.py # ONNX fallback
│ │ └── yolo_detector.py # Ultralytics YOLO fallback
│ ├── video/
│ │ ├── __init__.py
│ │ ├── mp4_source.py # MP4 file source
│ │ └── frame_processor.py # Frame processing pipeline
│ ├── output/
│ │ ├── __init__.py
│ │ ├── snapshot.py # Snapshot capture
│ │ └── annotation.py # YOLO label writer
│ └── debug/
│ ├── __init__.py
│ ├── object_list.py # Detected objects display
│ └── visualizer.py # Bounding box overlay
├── configs/ # NEW: Configuration files
│ ├── annotator.yaml # Annotator settings
│ └── frigate_mini.yaml # Frigate-mini settings
├── models/ # NEW: Model weights storage
│ └── .gitkeep
├── output/ # NEW: Default output directory
│ ├── snapshots/
│ ├── labels/
│ └── debug/
├── scripts/ # NEW: CLI scripts
│ ├── annotate.py # Run annotation pipeline
│ ├── frigate_mini.py # Run mini frigate
│ └── convert_to_rknn.py # Convert ONNX to RKNN
└── requirements.txt # Updated dependencies
Component 1: YOLO-Assisted Annotator
Purpose
Replace SAM2 auto-annotation with faster YOLOv9t-based detection for creating training datasets.
Workflow
MP4 Video → Frame Extraction → YOLOv9t Detection → Filter/NMS → YOLO Labels + Snapshots
Features
-
Model Loading
- Load pretrained YOLOv9t (.pt file)
- Support custom trained models
- Configurable confidence threshold
- Configurable NMS threshold
-
Video Processing
- MP4 file input
- Configurable FPS sampling
- Frame skip / time range selection
- Resolution scaling
-
Detection Filtering
- Filter by class IDs
- Filter by confidence score
- Filter by bbox size (min/max area)
- Filter by aspect ratio
-
Output Generation
- Clean snapshot images (no annotations drawn)
- YOLO format label files (.txt)
- Optional debug images with boxes drawn
- JSON manifest of all detections
Configuration (annotator.yaml)
# YOLO-Assisted Annotator Configuration
model:
path: "models/yolov9t.pt" # Path to YOLO model
device: "cuda" # cuda, cpu, or rknn
conf_threshold: 0.25 # Confidence threshold
iou_threshold: 0.45 # NMS IoU threshold
video:
source: "input/video.mp4" # Video file path
sample_fps: 2 # Frames per second to extract
max_frames: null # Max frames (null = all)
start_time: 0 # Start time in seconds
end_time: null # End time (null = end of video)
resize: null # [width, height] or null
detection:
classes: null # Class IDs to keep (null = all)
min_confidence: 0.3 # Minimum confidence to save
min_area: 100 # Minimum bbox area in pixels
max_area: null # Maximum bbox area (null = no limit)
min_size: 0.01 # Minimum bbox dimension (normalized)
output:
directory: "output/annotations" # Output directory
save_snapshots: true # Save clean images
save_labels: true # Save YOLO labels
save_debug: true # Save debug visualizations
save_manifest: true # Save JSON manifest
image_format: "jpg" # jpg or png
image_quality: 95 # JPEG quality (1-100)
classes:
# Class name mapping (for display/filtering)
0: "person"
1: "bicycle"
2: "car"
# ... etc
Component 2: Frigate-Mini-RKNN
Purpose
Minimal standalone Frigate-like system for RKNN inference on Rockchip devices, outputting annotation pairs.
Workflow
MP4 Feed → Frame Decode → RKNN Inference → Object Tracking → Snapshot + Label Export
Features
-
Video Input
- MP4 file as "camera" source
- Loop playback option
- Configurable FPS limit
- Multiple video sources support
-
RKNN Detector
- Load RKNN model (.rknn file)
- NPU acceleration on Rockchip SoCs
- Fallback to ONNX/CPU if RKNN unavailable
- Batch inference support
-
Object Detection
- YOLOv9t architecture support
- Configurable input resolution
- Post-processing (NMS, filtering)
- Class filtering
-
Snapshot System
- Capture on detection trigger
- Configurable cooldown period
- Clean snapshots (no overlays)
- Crop to detected object (optional)
-
Annotation Export
- YOLO format labels
- Synchronized snapshot-label pairs
- Auto-naming with timestamps
- Dataset structure output
-
Debug Mode
- Real-time object list display
- Bounding box visualization
- FPS counter
- Detection statistics
- Save debug frames
Configuration (frigate_mini.yaml)
Option A: ONNX CPU-Only (Recommended for development/testing)
# Frigate-Mini Configuration - ONNX CPU Mode
# Works on any system without special hardware
debug: true
log_level: "info"
detector:
type: "onnx" # Use ONNX Runtime
model_path: "models/yolov9t.onnx" # ONNX model file
input_size: [640, 640] # Model input resolution
conf_threshold: 0.25 # Detection confidence
nms_threshold: 0.45 # NMS threshold
# ONNX specific settings
onnx:
device: "cpu" # cpu or cuda
num_threads: 4 # CPU threads (0 = auto)
optimization_level: "all" # none, basic, extended, all
Option B: RKNN NPU (For Rockchip devices)
# Frigate-Mini Configuration - RKNN NPU Mode
# For Rockchip SBCs (RK3588, RK3568, etc.)
debug: true
log_level: "info"
detector:
type: "rknn" # Use RKNN Runtime
model_path: "models/yolov9t.rknn" # RKNN model file
input_size: [640, 640] # Model input resolution
conf_threshold: 0.25 # Detection confidence
nms_threshold: 0.45 # NMS threshold
# RKNN specific
rknn:
target_platform: "rk3588" # rk3588, rk3568, rk3566, etc.
core_mask: 7 # NPU core mask (7 = all 3 cores on RK3588)
# Fallback to ONNX if RKNN fails
fallback:
enabled: true
type: "onnx"
device: "cpu"
Option C: Ultralytics YOLO (For CUDA systems)
# Frigate-Mini Configuration - Ultralytics YOLO Mode
# For systems with NVIDIA GPU
debug: true
log_level: "info"
detector:
type: "yolo" # Use Ultralytics
model_path: "models/yolov9t.pt" # PyTorch model file
conf_threshold: 0.25
nms_threshold: 0.45
# YOLO specific
yolo:
device: "cuda" # cpu, cuda, cuda:0, etc.
half: true # FP16 inference (faster on GPU)
Full Configuration Example (with all options)
Video sources (cameras)
cameras: front_door: enabled: true source: "input/front_door.mp4" # MP4 file path fps: 5 # Processing FPS limit loop: true # Loop video playback
# Detection zones (optional)
detect:
enabled: true
width: 1280 # Detection resolution
height: 720
# Object filtering
objects:
track:
- person
- car
- dog
filters:
person:
min_area: 1000 # Minimum area in pixels
max_area: 500000
min_score: 0.4
backyard: enabled: true source: "input/backyard.mp4" fps: 5 loop: true
Snapshot settings
snapshots: enabled: true output_dir: "output/snapshots"
Trigger settings
trigger: objects: # Objects that trigger snapshot - person - car min_score: 0.5 # Minimum score to trigger cooldown: 2.0 # Seconds between snapshots per object
Output settings
format: "jpg" # jpg or png quality: 95 # JPEG quality clean: true # No annotations on snapshot crop: false # Crop to object bbox retain_days: 7 # Days to keep snapshots
Annotation export
annotations: enabled: true output_dir: "output/labels" format: "yolo" # YOLO format
Pairing
pair_with_snapshots: true # Create snapshot-label pairs
Filtering
min_score: 0.3 classes: null # null = all classes
Debug settings
debug_output: enabled: true output_dir: "output/debug"
Object list display
object_list: enabled: true show_confidence: true show_class: true show_bbox: true
Visualization
visualization: enabled: true draw_boxes: true draw_labels: true draw_confidence: true box_thickness: 2 font_scale: 0.5
Statistics
stats: show_fps: true show_detection_count: true log_interval: 100 # Log stats every N frames
Class definitions
class_names: 0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus 6: train 7: truck 8: boat 9: traffic light 10: fire hydrant
... COCO classes continue
---
## Module Specifications
### 1. yolo_annotator/annotator.py
```python
class YOLOAnnotator:
"""YOLO-based automatic video annotator."""
def __init__(self, config_path: str):
"""Load configuration and initialize model."""
def load_model(self, model_path: str, device: str) -> None:
"""Load YOLOv9t model."""
def process_video(self, video_path: str) -> AnnotationResult:
"""Process entire video and generate annotations."""
def process_frame(self, frame: np.ndarray) -> List[Detection]:
"""Process single frame and return detections."""
def filter_detections(self, detections: List[Detection]) -> List[Detection]:
"""Apply filtering rules to detections."""
def export_annotations(self, output_dir: str) -> None:
"""Export all annotations to YOLO format."""
2. frigate_mini/detector/rknn_detector.py
class RKNNDetector:
"""RKNN-based YOLO detector for Rockchip NPU."""
def __init__(self, model_path: str, target_platform: str):
"""Initialize RKNN runtime."""
def load_model(self) -> bool:
"""Load RKNN model to NPU."""
def preprocess(self, frame: np.ndarray) -> np.ndarray:
"""Preprocess frame for inference."""
def inference(self, input_data: np.ndarray) -> np.ndarray:
"""Run inference on NPU."""
def postprocess(self, outputs: np.ndarray) -> List[Detection]:
"""Parse YOLO outputs and apply NMS."""
def detect(self, frame: np.ndarray) -> List[Detection]:
"""Full detection pipeline."""
def release(self) -> None:
"""Release RKNN resources."""
3. frigate_mini/output/annotation.py
class AnnotationWriter:
"""Write YOLO format annotation files."""
def __init__(self, output_dir: str, class_names: Dict[int, str]):
"""Initialize annotation writer."""
def write_label(self,
image_name: str,
detections: List[Detection],
image_size: Tuple[int, int]) -> str:
"""Write YOLO label file for image."""
def detection_to_yolo(self,
detection: Detection,
image_width: int,
image_height: int) -> str:
"""Convert detection to YOLO format string."""
def create_dataset_structure(self) -> None:
"""Create YOLO dataset directory structure."""
def write_data_yaml(self, train_path: str, val_path: str) -> str:
"""Generate data.yaml for training."""
4. frigate_mini/debug/object_list.py
class ObjectListDisplay:
"""Display detected objects in debug mode."""
def __init__(self, config: Dict):
"""Initialize display settings."""
def update(self, detections: List[Detection]) -> None:
"""Update object list with new detections."""
def format_detection(self, detection: Detection) -> str:
"""Format single detection for display."""
def print_list(self) -> None:
"""Print current object list to console."""
def save_snapshot_with_labels(self,
frame: np.ndarray,
detections: List[Detection],
output_path: str) -> None:
"""Save debug image with annotations."""
Data Structures
Detection
@dataclass
class Detection:
class_id: int # Class index
class_name: str # Class name
confidence: float # Detection confidence (0-1)
bbox: BBox # Bounding box
track_id: Optional[int] # Tracking ID (if tracked)
timestamp: float # Frame timestamp
frame_id: int # Frame number
@dataclass
class BBox:
x1: float # Top-left x (pixels)
y1: float # Top-left y (pixels)
x2: float # Bottom-right x (pixels)
y2: float # Bottom-right y (pixels)
def to_yolo(self, img_w: int, img_h: int) -> Tuple[float, float, float, float]:
"""Convert to YOLO format (x_center, y_center, width, height) normalized."""
def area(self) -> float:
"""Calculate bbox area in pixels."""
AnnotationPair
@dataclass
class AnnotationPair:
image_path: str # Path to snapshot image
label_path: str # Path to YOLO label file
detections: List[Detection]
timestamp: datetime
camera_name: str
frame_id: int
Output Format
Directory Structure
output/
├── snapshots/
│ ├── front_door/
│ │ ├── 20240115_143022_001.jpg
│ │ ├── 20240115_143025_002.jpg
│ │ └── ...
│ └── backyard/
│ └── ...
├── labels/
│ ├── front_door/
│ │ ├── 20240115_143022_001.txt
│ │ ├── 20240115_143025_002.txt
│ │ └── ...
│ └── backyard/
│ └── ...
├── debug/
│ ├── front_door/
│ │ ├── 20240115_143022_001_debug.jpg
│ │ └── ...
│ └── object_log.txt
└── manifest.json
YOLO Label Format
# {class_id} {x_center} {y_center} {width} {height}
0 0.456789 0.321456 0.123456 0.234567
2 0.789012 0.654321 0.098765 0.176543
Manifest JSON
{
"created": "2024-01-15T14:30:22",
"model": "yolov9t.rknn",
"total_frames": 1500,
"total_detections": 3420,
"pairs": [
{
"image": "snapshots/front_door/20240115_143022_001.jpg",
"label": "labels/front_door/20240115_143022_001.txt",
"camera": "front_door",
"frame_id": 150,
"timestamp": "2024-01-15T14:30:22.500",
"detections": [
{"class": "person", "confidence": 0.87},
{"class": "car", "confidence": 0.92}
]
}
]
}
Implementation Phases
Phase 1: Core YOLO Annotator (Week 1)
- Create
yolo_annotator/module structure - Implement
YOLOAnnotatorclass with Ultralytics backend - Implement video source handling
- Implement YOLO label export
- Create
annotator.yamlconfig loader - Add CLI script
scripts/annotate.py - Test with sample video
Phase 2: Frigate-Mini Base (Week 2)
- Create
frigate_mini/module structure - Implement config schema and loader
- Implement base detector interface
- Implement ONNX detector (for testing)
- Implement MP4 video source
- Implement basic frame processing loop
- Test basic detection pipeline
Phase 3: RKNN Integration (Week 3)
- Implement RKNN detector backend
- Create ONNX to RKNN conversion script
- Test on Rockchip hardware (RK3588/RK3568)
- Optimize for NPU performance
- Add fallback mechanism
Phase 4: Snapshot & Annotation System (Week 4)
- Implement snapshot capture system
- Implement annotation writer
- Implement snapshot-label pairing
- Add trigger-based capture logic
- Create manifest generator
Phase 5: Debug System (Week 5)
- Implement object list display
- Implement debug visualization
- Add statistics tracking
- Create debug frame saver
- Add console and file logging
Phase 6: Integration & Testing (Week 6)
- Integration testing
- Performance optimization
- Documentation
- Example configs for common use cases
- Package for distribution
Dependencies
New Requirements
# requirements.txt additions
# YOLO
ultralytics>=8.0.0
# RKNN (install separately based on platform)
# rknn-toolkit2 # For conversion (x86)
# rknnlite2 # For inference (ARM)
# Video processing
opencv-python>=4.8.0
av>=10.0.0 # PyAV for efficient video decoding
# Configuration
pyyaml>=6.0
pydantic>=2.0 # Config validation
# Utilities
tqdm>=4.65.0
numpy>=1.24.0
RKNN Installation Notes
# On x86 host (for model conversion):
pip install rknn-toolkit2
# On Rockchip device (for inference):
pip install rknnlite2
# Or install from Rockchip GitHub releases
Usage Examples
1. CPU-Only Workflow (ONNX) - Recommended for Development
# Step 1: Download pretrained YOLOv9t
wget https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9t.pt -O models/yolov9t.pt
# Step 2: Convert to ONNX
python scripts/convert_to_onnx.py --input models/yolov9t.pt --output models/yolov9t.onnx
# Step 3a: Auto-annotate video (CPU)
python scripts/annotate.py --config configs/annotator_cpu.yaml
# Or with CLI args:
python scripts/annotate.py \
--model models/yolov9t.onnx \
--video input/video.mp4 \
--device cpu
# Step 3b: Run Frigate-Mini (CPU)
python scripts/frigate_mini.py --config configs/frigate_mini_cpu.yaml
# Or with CLI args:
python scripts/frigate_mini.py \
--model models/yolov9t.onnx \
--video input/video.mp4 \
--output output/ \
--debug
2. RKNN Workflow (Rockchip NPU)
# Step 1: Convert ONNX to RKNN (on x86 host)
python scripts/convert_to_rknn.py \
--input models/yolov9t.onnx \
--output models/yolov9t.rknn \
--platform rk3588
# Step 2: Copy to Rockchip device and run
python scripts/frigate_mini.py --config configs/frigate_mini.yaml
# Or:
python scripts/frigate_mini.py \
--model models/yolov9t.rknn \
--video input/video.mp4 \
--platform rk3588
3. GPU Workflow (CUDA)
# Using Ultralytics directly with GPU
python scripts/annotate.py \
--model models/yolov9t.pt \
--video input/video.mp4 \
--device cuda
Quick Reference
| Task | CPU (ONNX) | RKNN (NPU) | GPU (CUDA) |
|---|---|---|---|
| Model file | .onnx |
.rknn |
.pt |
| Config | *_cpu.yaml |
frigate_mini.yaml |
Use --device cuda |
| Speed | 5-15 FPS | 30+ FPS | 50+ FPS |
| Hardware | Any CPU | Rockchip SBC | NVIDIA GPU |
Future Enhancements
- RTSP Support - Add real camera stream input
- Object Tracking - Add ByteTrack/BoT-SORT for consistent IDs
- Web UI - Simple web interface for monitoring
- Multi-model - Support different models per camera
- Event System - Webhooks for detection events
- Auto-labeling Refinement - Use SAM2 to refine YOLO boxes
- Active Learning - Flag low-confidence detections for review