# Feature Plan: YOLO-Assisted Auto-Annotation + Mini Frigate RKNN ## Overview Create two integrated components: 1. **YOLO-Assisted Annotator** - Use pretrained YOLOv9t to auto-annotate video frames 2. **Frigate-Mini-RKNN** - Standalone mini fork of Frigate for RKNN inference with MP4 input ## Goals - Auto-annotate videos using YOLOv9t pretrained model (replaces manual SAM2 prompts) - Minimal Frigate fork with multiple detector backends: - **RKNN** - Rockchip NPU acceleration (RK3588, RK3568, etc.) - **ONNX** - CPU-only inference (cross-platform, no special hardware) - **YOLO** - Ultralytics backend (CPU/CUDA) - MP4 file as camera feed source - Output: Clean snapshot + YOLO format label pairs - Simple text-based configuration - Debug mode with object list visualization ## Detector Backends Comparison | Backend | Hardware | Performance | Platform | Use Case | |---------|----------|-------------|----------|----------| | **RKNN** | Rockchip NPU | Fast (30+ FPS) | ARM (RK3588/3568) | Production on Rockchip SBC | | **ONNX** | CPU | Medium (5-15 FPS) | Any (x86/ARM) | Development, testing, no GPU | | **YOLO** | CPU/CUDA | Fast with GPU | Any | Development, CUDA systems | ### Recommended Workflow 1. **Development/Testing**: Use ONNX backend on any CPU 2. **Production on Rockchip**: Convert to RKNN, deploy on NPU 3. **Production on x86/CUDA**: Use YOLO backend with GPU --- ## Project Structure ``` sam2-yolo-pipeline/ ├── notebooks/ # Existing Kaggle notebooks ├── utils/ # Existing utilities ├── yolo_annotator/ # NEW: YOLO-assisted annotation │ ├── __init__.py │ ├── annotator.py # Core YOLOv9t annotator │ ├── video_source.py # MP4/RTSP video source handler │ ├── export.py # Snapshot + label export │ └── visualizer.py # Debug visualization ├── frigate_mini/ # NEW: Mini Frigate fork │ ├── __init__.py │ ├── app.py # Main application entry │ ├── config/ │ │ ├── __init__.py │ │ ├── schema.py # Config validation │ │ └── loader.py # YAML config loader │ ├── detector/ │ │ ├── __init__.py │ │ ├── base.py # Base detector interface │ │ ├── rknn_detector.py # RKNN backend │ │ ├── onnx_detector.py # ONNX fallback │ │ └── yolo_detector.py # Ultralytics YOLO fallback │ ├── video/ │ │ ├── __init__.py │ │ ├── mp4_source.py # MP4 file source │ │ └── frame_processor.py # Frame processing pipeline │ ├── output/ │ │ ├── __init__.py │ │ ├── snapshot.py # Snapshot capture │ │ └── annotation.py # YOLO label writer │ └── debug/ │ ├── __init__.py │ ├── object_list.py # Detected objects display │ └── visualizer.py # Bounding box overlay ├── configs/ # NEW: Configuration files │ ├── annotator.yaml # Annotator settings │ └── frigate_mini.yaml # Frigate-mini settings ├── models/ # NEW: Model weights storage │ └── .gitkeep ├── output/ # NEW: Default output directory │ ├── snapshots/ │ ├── labels/ │ └── debug/ ├── scripts/ # NEW: CLI scripts │ ├── annotate.py # Run annotation pipeline │ ├── frigate_mini.py # Run mini frigate │ └── convert_to_rknn.py # Convert ONNX to RKNN └── requirements.txt # Updated dependencies ``` --- ## Component 1: YOLO-Assisted Annotator ### Purpose Replace SAM2 auto-annotation with faster YOLOv9t-based detection for creating training datasets. ### Workflow ``` MP4 Video → Frame Extraction → YOLOv9t Detection → Filter/NMS → YOLO Labels + Snapshots ``` ### Features 1. **Model Loading** - Load pretrained YOLOv9t (.pt file) - Support custom trained models - Configurable confidence threshold - Configurable NMS threshold 2. **Video Processing** - MP4 file input - Configurable FPS sampling - Frame skip / time range selection - Resolution scaling 3. **Detection Filtering** - Filter by class IDs - Filter by confidence score - Filter by bbox size (min/max area) - Filter by aspect ratio 4. **Output Generation** - Clean snapshot images (no annotations drawn) - YOLO format label files (.txt) - Optional debug images with boxes drawn - JSON manifest of all detections ### Configuration (annotator.yaml) ```yaml # YOLO-Assisted Annotator Configuration model: path: "models/yolov9t.pt" # Path to YOLO model device: "cuda" # cuda, cpu, or rknn conf_threshold: 0.25 # Confidence threshold iou_threshold: 0.45 # NMS IoU threshold video: source: "input/video.mp4" # Video file path sample_fps: 2 # Frames per second to extract max_frames: null # Max frames (null = all) start_time: 0 # Start time in seconds end_time: null # End time (null = end of video) resize: null # [width, height] or null detection: classes: null # Class IDs to keep (null = all) min_confidence: 0.3 # Minimum confidence to save min_area: 100 # Minimum bbox area in pixels max_area: null # Maximum bbox area (null = no limit) min_size: 0.01 # Minimum bbox dimension (normalized) output: directory: "output/annotations" # Output directory save_snapshots: true # Save clean images save_labels: true # Save YOLO labels save_debug: true # Save debug visualizations save_manifest: true # Save JSON manifest image_format: "jpg" # jpg or png image_quality: 95 # JPEG quality (1-100) classes: # Class name mapping (for display/filtering) 0: "person" 1: "bicycle" 2: "car" # ... etc ``` --- ## Component 2: Frigate-Mini-RKNN ### Purpose Minimal standalone Frigate-like system for RKNN inference on Rockchip devices, outputting annotation pairs. ### Workflow ``` MP4 Feed → Frame Decode → RKNN Inference → Object Tracking → Snapshot + Label Export ``` ### Features 1. **Video Input** - MP4 file as "camera" source - Loop playback option - Configurable FPS limit - Multiple video sources support 2. **RKNN Detector** - Load RKNN model (.rknn file) - NPU acceleration on Rockchip SoCs - Fallback to ONNX/CPU if RKNN unavailable - Batch inference support 3. **Object Detection** - YOLOv9t architecture support - Configurable input resolution - Post-processing (NMS, filtering) - Class filtering 4. **Snapshot System** - Capture on detection trigger - Configurable cooldown period - Clean snapshots (no overlays) - Crop to detected object (optional) 5. **Annotation Export** - YOLO format labels - Synchronized snapshot-label pairs - Auto-naming with timestamps - Dataset structure output 6. **Debug Mode** - Real-time object list display - Bounding box visualization - FPS counter - Detection statistics - Save debug frames ### Configuration (frigate_mini.yaml) #### Option A: ONNX CPU-Only (Recommended for development/testing) ```yaml # Frigate-Mini Configuration - ONNX CPU Mode # Works on any system without special hardware debug: true log_level: "info" detector: type: "onnx" # Use ONNX Runtime model_path: "models/yolov9t.onnx" # ONNX model file input_size: [640, 640] # Model input resolution conf_threshold: 0.25 # Detection confidence nms_threshold: 0.45 # NMS threshold # ONNX specific settings onnx: device: "cpu" # cpu or cuda num_threads: 4 # CPU threads (0 = auto) optimization_level: "all" # none, basic, extended, all ``` #### Option B: RKNN NPU (For Rockchip devices) ```yaml # Frigate-Mini Configuration - RKNN NPU Mode # For Rockchip SBCs (RK3588, RK3568, etc.) debug: true log_level: "info" detector: type: "rknn" # Use RKNN Runtime model_path: "models/yolov9t.rknn" # RKNN model file input_size: [640, 640] # Model input resolution conf_threshold: 0.25 # Detection confidence nms_threshold: 0.45 # NMS threshold # RKNN specific rknn: target_platform: "rk3588" # rk3588, rk3568, rk3566, etc. core_mask: 7 # NPU core mask (7 = all 3 cores on RK3588) # Fallback to ONNX if RKNN fails fallback: enabled: true type: "onnx" device: "cpu" ``` #### Option C: Ultralytics YOLO (For CUDA systems) ```yaml # Frigate-Mini Configuration - Ultralytics YOLO Mode # For systems with NVIDIA GPU debug: true log_level: "info" detector: type: "yolo" # Use Ultralytics model_path: "models/yolov9t.pt" # PyTorch model file conf_threshold: 0.25 nms_threshold: 0.45 # YOLO specific yolo: device: "cuda" # cpu, cuda, cuda:0, etc. half: true # FP16 inference (faster on GPU) ``` #### Full Configuration Example (with all options) # Video sources (cameras) cameras: front_door: enabled: true source: "input/front_door.mp4" # MP4 file path fps: 5 # Processing FPS limit loop: true # Loop video playback # Detection zones (optional) detect: enabled: true width: 1280 # Detection resolution height: 720 # Object filtering objects: track: - person - car - dog filters: person: min_area: 1000 # Minimum area in pixels max_area: 500000 min_score: 0.4 backyard: enabled: true source: "input/backyard.mp4" fps: 5 loop: true # Snapshot settings snapshots: enabled: true output_dir: "output/snapshots" # Trigger settings trigger: objects: # Objects that trigger snapshot - person - car min_score: 0.5 # Minimum score to trigger cooldown: 2.0 # Seconds between snapshots per object # Output settings format: "jpg" # jpg or png quality: 95 # JPEG quality clean: true # No annotations on snapshot crop: false # Crop to object bbox retain_days: 7 # Days to keep snapshots # Annotation export annotations: enabled: true output_dir: "output/labels" format: "yolo" # YOLO format # Pairing pair_with_snapshots: true # Create snapshot-label pairs # Filtering min_score: 0.3 classes: null # null = all classes # Debug settings debug_output: enabled: true output_dir: "output/debug" # Object list display object_list: enabled: true show_confidence: true show_class: true show_bbox: true # Visualization visualization: enabled: true draw_boxes: true draw_labels: true draw_confidence: true box_thickness: 2 font_scale: 0.5 # Statistics stats: show_fps: true show_detection_count: true log_interval: 100 # Log stats every N frames # Class definitions class_names: 0: person 1: bicycle 2: car 3: motorcycle 4: airplane 5: bus 6: train 7: truck 8: boat 9: traffic light 10: fire hydrant # ... COCO classes continue ``` --- ## Module Specifications ### 1. yolo_annotator/annotator.py ```python class YOLOAnnotator: """YOLO-based automatic video annotator.""" def __init__(self, config_path: str): """Load configuration and initialize model.""" def load_model(self, model_path: str, device: str) -> None: """Load YOLOv9t model.""" def process_video(self, video_path: str) -> AnnotationResult: """Process entire video and generate annotations.""" def process_frame(self, frame: np.ndarray) -> List[Detection]: """Process single frame and return detections.""" def filter_detections(self, detections: List[Detection]) -> List[Detection]: """Apply filtering rules to detections.""" def export_annotations(self, output_dir: str) -> None: """Export all annotations to YOLO format.""" ``` ### 2. frigate_mini/detector/rknn_detector.py ```python class RKNNDetector: """RKNN-based YOLO detector for Rockchip NPU.""" def __init__(self, model_path: str, target_platform: str): """Initialize RKNN runtime.""" def load_model(self) -> bool: """Load RKNN model to NPU.""" def preprocess(self, frame: np.ndarray) -> np.ndarray: """Preprocess frame for inference.""" def inference(self, input_data: np.ndarray) -> np.ndarray: """Run inference on NPU.""" def postprocess(self, outputs: np.ndarray) -> List[Detection]: """Parse YOLO outputs and apply NMS.""" def detect(self, frame: np.ndarray) -> List[Detection]: """Full detection pipeline.""" def release(self) -> None: """Release RKNN resources.""" ``` ### 3. frigate_mini/output/annotation.py ```python class AnnotationWriter: """Write YOLO format annotation files.""" def __init__(self, output_dir: str, class_names: Dict[int, str]): """Initialize annotation writer.""" def write_label(self, image_name: str, detections: List[Detection], image_size: Tuple[int, int]) -> str: """Write YOLO label file for image.""" def detection_to_yolo(self, detection: Detection, image_width: int, image_height: int) -> str: """Convert detection to YOLO format string.""" def create_dataset_structure(self) -> None: """Create YOLO dataset directory structure.""" def write_data_yaml(self, train_path: str, val_path: str) -> str: """Generate data.yaml for training.""" ``` ### 4. frigate_mini/debug/object_list.py ```python class ObjectListDisplay: """Display detected objects in debug mode.""" def __init__(self, config: Dict): """Initialize display settings.""" def update(self, detections: List[Detection]) -> None: """Update object list with new detections.""" def format_detection(self, detection: Detection) -> str: """Format single detection for display.""" def print_list(self) -> None: """Print current object list to console.""" def save_snapshot_with_labels(self, frame: np.ndarray, detections: List[Detection], output_path: str) -> None: """Save debug image with annotations.""" ``` --- ## Data Structures ### Detection ```python @dataclass class Detection: class_id: int # Class index class_name: str # Class name confidence: float # Detection confidence (0-1) bbox: BBox # Bounding box track_id: Optional[int] # Tracking ID (if tracked) timestamp: float # Frame timestamp frame_id: int # Frame number @dataclass class BBox: x1: float # Top-left x (pixels) y1: float # Top-left y (pixels) x2: float # Bottom-right x (pixels) y2: float # Bottom-right y (pixels) def to_yolo(self, img_w: int, img_h: int) -> Tuple[float, float, float, float]: """Convert to YOLO format (x_center, y_center, width, height) normalized.""" def area(self) -> float: """Calculate bbox area in pixels.""" ``` ### AnnotationPair ```python @dataclass class AnnotationPair: image_path: str # Path to snapshot image label_path: str # Path to YOLO label file detections: List[Detection] timestamp: datetime camera_name: str frame_id: int ``` --- ## Output Format ### Directory Structure ``` output/ ├── snapshots/ │ ├── front_door/ │ │ ├── 20240115_143022_001.jpg │ │ ├── 20240115_143025_002.jpg │ │ └── ... │ └── backyard/ │ └── ... ├── labels/ │ ├── front_door/ │ │ ├── 20240115_143022_001.txt │ │ ├── 20240115_143025_002.txt │ │ └── ... │ └── backyard/ │ └── ... ├── debug/ │ ├── front_door/ │ │ ├── 20240115_143022_001_debug.jpg │ │ └── ... │ └── object_log.txt └── manifest.json ``` ### YOLO Label Format ``` # {class_id} {x_center} {y_center} {width} {height} 0 0.456789 0.321456 0.123456 0.234567 2 0.789012 0.654321 0.098765 0.176543 ``` ### Manifest JSON ```json { "created": "2024-01-15T14:30:22", "model": "yolov9t.rknn", "total_frames": 1500, "total_detections": 3420, "pairs": [ { "image": "snapshots/front_door/20240115_143022_001.jpg", "label": "labels/front_door/20240115_143022_001.txt", "camera": "front_door", "frame_id": 150, "timestamp": "2024-01-15T14:30:22.500", "detections": [ {"class": "person", "confidence": 0.87}, {"class": "car", "confidence": 0.92} ] } ] } ``` --- ## Implementation Phases ### Phase 1: Core YOLO Annotator (Week 1) - [ ] Create `yolo_annotator/` module structure - [ ] Implement `YOLOAnnotator` class with Ultralytics backend - [ ] Implement video source handling - [ ] Implement YOLO label export - [ ] Create `annotator.yaml` config loader - [ ] Add CLI script `scripts/annotate.py` - [ ] Test with sample video ### Phase 2: Frigate-Mini Base (Week 2) - [ ] Create `frigate_mini/` module structure - [ ] Implement config schema and loader - [ ] Implement base detector interface - [ ] Implement ONNX detector (for testing) - [ ] Implement MP4 video source - [ ] Implement basic frame processing loop - [ ] Test basic detection pipeline ### Phase 3: RKNN Integration (Week 3) - [ ] Implement RKNN detector backend - [ ] Create ONNX to RKNN conversion script - [ ] Test on Rockchip hardware (RK3588/RK3568) - [ ] Optimize for NPU performance - [ ] Add fallback mechanism ### Phase 4: Snapshot & Annotation System (Week 4) - [ ] Implement snapshot capture system - [ ] Implement annotation writer - [ ] Implement snapshot-label pairing - [ ] Add trigger-based capture logic - [ ] Create manifest generator ### Phase 5: Debug System (Week 5) - [ ] Implement object list display - [ ] Implement debug visualization - [ ] Add statistics tracking - [ ] Create debug frame saver - [ ] Add console and file logging ### Phase 6: Integration & Testing (Week 6) - [ ] Integration testing - [ ] Performance optimization - [ ] Documentation - [ ] Example configs for common use cases - [ ] Package for distribution --- ## Dependencies ### New Requirements ``` # requirements.txt additions # YOLO ultralytics>=8.0.0 # RKNN (install separately based on platform) # rknn-toolkit2 # For conversion (x86) # rknnlite2 # For inference (ARM) # Video processing opencv-python>=4.8.0 av>=10.0.0 # PyAV for efficient video decoding # Configuration pyyaml>=6.0 pydantic>=2.0 # Config validation # Utilities tqdm>=4.65.0 numpy>=1.24.0 ``` ### RKNN Installation Notes ```bash # On x86 host (for model conversion): pip install rknn-toolkit2 # On Rockchip device (for inference): pip install rknnlite2 # Or install from Rockchip GitHub releases ``` --- ## Usage Examples ### 1. CPU-Only Workflow (ONNX) - Recommended for Development ```bash # Step 1: Download pretrained YOLOv9t wget https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9t.pt -O models/yolov9t.pt # Step 2: Convert to ONNX python scripts/convert_to_onnx.py --input models/yolov9t.pt --output models/yolov9t.onnx # Step 3a: Auto-annotate video (CPU) python scripts/annotate.py --config configs/annotator_cpu.yaml # Or with CLI args: python scripts/annotate.py \ --model models/yolov9t.onnx \ --video input/video.mp4 \ --device cpu # Step 3b: Run Frigate-Mini (CPU) python scripts/frigate_mini.py --config configs/frigate_mini_cpu.yaml # Or with CLI args: python scripts/frigate_mini.py \ --model models/yolov9t.onnx \ --video input/video.mp4 \ --output output/ \ --debug ``` ### 2. RKNN Workflow (Rockchip NPU) ```bash # Step 1: Convert ONNX to RKNN (on x86 host) python scripts/convert_to_rknn.py \ --input models/yolov9t.onnx \ --output models/yolov9t.rknn \ --platform rk3588 # Step 2: Copy to Rockchip device and run python scripts/frigate_mini.py --config configs/frigate_mini.yaml # Or: python scripts/frigate_mini.py \ --model models/yolov9t.rknn \ --video input/video.mp4 \ --platform rk3588 ``` ### 3. GPU Workflow (CUDA) ```bash # Using Ultralytics directly with GPU python scripts/annotate.py \ --model models/yolov9t.pt \ --video input/video.mp4 \ --device cuda ``` ### Quick Reference | Task | CPU (ONNX) | RKNN (NPU) | GPU (CUDA) | |------|------------|------------|------------| | Model file | `.onnx` | `.rknn` | `.pt` | | Config | `*_cpu.yaml` | `frigate_mini.yaml` | Use `--device cuda` | | Speed | 5-15 FPS | 30+ FPS | 50+ FPS | | Hardware | Any CPU | Rockchip SBC | NVIDIA GPU | --- ## Future Enhancements 1. **RTSP Support** - Add real camera stream input 2. **Object Tracking** - Add ByteTrack/BoT-SORT for consistent IDs 3. **Web UI** - Simple web interface for monitoring 4. **Multi-model** - Support different models per camera 5. **Event System** - Webhooks for detection events 6. **Auto-labeling Refinement** - Use SAM2 to refine YOLO boxes 7. **Active Learning** - Flag low-confidence detections for review --- ## References - [Ultralytics YOLOv9](https://github.com/ultralytics/ultralytics) - [RKNN-Toolkit2](https://github.com/rockchip-linux/rknn-toolkit2) - [Frigate NVR](https://github.com/blakeblackshear/frigate) - [YOLO Label Format](https://docs.ultralytics.com/datasets/detect/)