801 lines
23 KiB
Markdown
801 lines
23 KiB
Markdown
# Feature Plan: YOLO-Assisted Auto-Annotation + Mini Frigate RKNN
|
|
|
|
## Overview
|
|
|
|
Create two integrated components:
|
|
1. **YOLO-Assisted Annotator** - Use pretrained YOLOv9t to auto-annotate video frames
|
|
2. **Frigate-Mini-RKNN** - Standalone mini fork of Frigate for RKNN inference with MP4 input
|
|
|
|
## Goals
|
|
|
|
- Auto-annotate videos using YOLOv9t pretrained model (replaces manual SAM2 prompts)
|
|
- Minimal Frigate fork with multiple detector backends:
|
|
- **RKNN** - Rockchip NPU acceleration (RK3588, RK3568, etc.)
|
|
- **ONNX** - CPU-only inference (cross-platform, no special hardware)
|
|
- **YOLO** - Ultralytics backend (CPU/CUDA)
|
|
- MP4 file as camera feed source
|
|
- Output: Clean snapshot + YOLO format label pairs
|
|
- Simple text-based configuration
|
|
- Debug mode with object list visualization
|
|
|
|
## Detector Backends Comparison
|
|
|
|
| Backend | Hardware | Performance | Platform | Use Case |
|
|
|---------|----------|-------------|----------|----------|
|
|
| **RKNN** | Rockchip NPU | Fast (30+ FPS) | ARM (RK3588/3568) | Production on Rockchip SBC |
|
|
| **ONNX** | CPU | Medium (5-15 FPS) | Any (x86/ARM) | Development, testing, no GPU |
|
|
| **YOLO** | CPU/CUDA | Fast with GPU | Any | Development, CUDA systems |
|
|
|
|
### Recommended Workflow
|
|
|
|
1. **Development/Testing**: Use ONNX backend on any CPU
|
|
2. **Production on Rockchip**: Convert to RKNN, deploy on NPU
|
|
3. **Production on x86/CUDA**: Use YOLO backend with GPU
|
|
|
|
---
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
sam2-yolo-pipeline/
|
|
├── notebooks/ # Existing Kaggle notebooks
|
|
├── utils/ # Existing utilities
|
|
├── yolo_annotator/ # NEW: YOLO-assisted annotation
|
|
│ ├── __init__.py
|
|
│ ├── annotator.py # Core YOLOv9t annotator
|
|
│ ├── video_source.py # MP4/RTSP video source handler
|
|
│ ├── export.py # Snapshot + label export
|
|
│ └── visualizer.py # Debug visualization
|
|
├── frigate_mini/ # NEW: Mini Frigate fork
|
|
│ ├── __init__.py
|
|
│ ├── app.py # Main application entry
|
|
│ ├── config/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── schema.py # Config validation
|
|
│ │ └── loader.py # YAML config loader
|
|
│ ├── detector/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── base.py # Base detector interface
|
|
│ │ ├── rknn_detector.py # RKNN backend
|
|
│ │ ├── onnx_detector.py # ONNX fallback
|
|
│ │ └── yolo_detector.py # Ultralytics YOLO fallback
|
|
│ ├── video/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── mp4_source.py # MP4 file source
|
|
│ │ └── frame_processor.py # Frame processing pipeline
|
|
│ ├── output/
|
|
│ │ ├── __init__.py
|
|
│ │ ├── snapshot.py # Snapshot capture
|
|
│ │ └── annotation.py # YOLO label writer
|
|
│ └── debug/
|
|
│ ├── __init__.py
|
|
│ ├── object_list.py # Detected objects display
|
|
│ └── visualizer.py # Bounding box overlay
|
|
├── configs/ # NEW: Configuration files
|
|
│ ├── annotator.yaml # Annotator settings
|
|
│ └── frigate_mini.yaml # Frigate-mini settings
|
|
├── models/ # NEW: Model weights storage
|
|
│ └── .gitkeep
|
|
├── output/ # NEW: Default output directory
|
|
│ ├── snapshots/
|
|
│ ├── labels/
|
|
│ └── debug/
|
|
├── scripts/ # NEW: CLI scripts
|
|
│ ├── annotate.py # Run annotation pipeline
|
|
│ ├── frigate_mini.py # Run mini frigate
|
|
│ └── convert_to_rknn.py # Convert ONNX to RKNN
|
|
└── requirements.txt # Updated dependencies
|
|
```
|
|
|
|
---
|
|
|
|
## Component 1: YOLO-Assisted Annotator
|
|
|
|
### Purpose
|
|
Replace SAM2 auto-annotation with faster YOLOv9t-based detection for creating training datasets.
|
|
|
|
### Workflow
|
|
```
|
|
MP4 Video → Frame Extraction → YOLOv9t Detection → Filter/NMS → YOLO Labels + Snapshots
|
|
```
|
|
|
|
### Features
|
|
|
|
1. **Model Loading**
|
|
- Load pretrained YOLOv9t (.pt file)
|
|
- Support custom trained models
|
|
- Configurable confidence threshold
|
|
- Configurable NMS threshold
|
|
|
|
2. **Video Processing**
|
|
- MP4 file input
|
|
- Configurable FPS sampling
|
|
- Frame skip / time range selection
|
|
- Resolution scaling
|
|
|
|
3. **Detection Filtering**
|
|
- Filter by class IDs
|
|
- Filter by confidence score
|
|
- Filter by bbox size (min/max area)
|
|
- Filter by aspect ratio
|
|
|
|
4. **Output Generation**
|
|
- Clean snapshot images (no annotations drawn)
|
|
- YOLO format label files (.txt)
|
|
- Optional debug images with boxes drawn
|
|
- JSON manifest of all detections
|
|
|
|
### Configuration (annotator.yaml)
|
|
|
|
```yaml
|
|
# YOLO-Assisted Annotator Configuration
|
|
|
|
model:
|
|
path: "models/yolov9t.pt" # Path to YOLO model
|
|
device: "cuda" # cuda, cpu, or rknn
|
|
conf_threshold: 0.25 # Confidence threshold
|
|
iou_threshold: 0.45 # NMS IoU threshold
|
|
|
|
video:
|
|
source: "input/video.mp4" # Video file path
|
|
sample_fps: 2 # Frames per second to extract
|
|
max_frames: null # Max frames (null = all)
|
|
start_time: 0 # Start time in seconds
|
|
end_time: null # End time (null = end of video)
|
|
resize: null # [width, height] or null
|
|
|
|
detection:
|
|
classes: null # Class IDs to keep (null = all)
|
|
min_confidence: 0.3 # Minimum confidence to save
|
|
min_area: 100 # Minimum bbox area in pixels
|
|
max_area: null # Maximum bbox area (null = no limit)
|
|
min_size: 0.01 # Minimum bbox dimension (normalized)
|
|
|
|
output:
|
|
directory: "output/annotations" # Output directory
|
|
save_snapshots: true # Save clean images
|
|
save_labels: true # Save YOLO labels
|
|
save_debug: true # Save debug visualizations
|
|
save_manifest: true # Save JSON manifest
|
|
image_format: "jpg" # jpg or png
|
|
image_quality: 95 # JPEG quality (1-100)
|
|
|
|
classes:
|
|
# Class name mapping (for display/filtering)
|
|
0: "person"
|
|
1: "bicycle"
|
|
2: "car"
|
|
# ... etc
|
|
```
|
|
|
|
---
|
|
|
|
## Component 2: Frigate-Mini-RKNN
|
|
|
|
### Purpose
|
|
Minimal standalone Frigate-like system for RKNN inference on Rockchip devices, outputting annotation pairs.
|
|
|
|
### Workflow
|
|
```
|
|
MP4 Feed → Frame Decode → RKNN Inference → Object Tracking → Snapshot + Label Export
|
|
```
|
|
|
|
### Features
|
|
|
|
1. **Video Input**
|
|
- MP4 file as "camera" source
|
|
- Loop playback option
|
|
- Configurable FPS limit
|
|
- Multiple video sources support
|
|
|
|
2. **RKNN Detector**
|
|
- Load RKNN model (.rknn file)
|
|
- NPU acceleration on Rockchip SoCs
|
|
- Fallback to ONNX/CPU if RKNN unavailable
|
|
- Batch inference support
|
|
|
|
3. **Object Detection**
|
|
- YOLOv9t architecture support
|
|
- Configurable input resolution
|
|
- Post-processing (NMS, filtering)
|
|
- Class filtering
|
|
|
|
4. **Snapshot System**
|
|
- Capture on detection trigger
|
|
- Configurable cooldown period
|
|
- Clean snapshots (no overlays)
|
|
- Crop to detected object (optional)
|
|
|
|
5. **Annotation Export**
|
|
- YOLO format labels
|
|
- Synchronized snapshot-label pairs
|
|
- Auto-naming with timestamps
|
|
- Dataset structure output
|
|
|
|
6. **Debug Mode**
|
|
- Real-time object list display
|
|
- Bounding box visualization
|
|
- FPS counter
|
|
- Detection statistics
|
|
- Save debug frames
|
|
|
|
### Configuration (frigate_mini.yaml)
|
|
|
|
#### Option A: ONNX CPU-Only (Recommended for development/testing)
|
|
|
|
```yaml
|
|
# Frigate-Mini Configuration - ONNX CPU Mode
|
|
# Works on any system without special hardware
|
|
|
|
debug: true
|
|
log_level: "info"
|
|
|
|
detector:
|
|
type: "onnx" # Use ONNX Runtime
|
|
model_path: "models/yolov9t.onnx" # ONNX model file
|
|
input_size: [640, 640] # Model input resolution
|
|
conf_threshold: 0.25 # Detection confidence
|
|
nms_threshold: 0.45 # NMS threshold
|
|
|
|
# ONNX specific settings
|
|
onnx:
|
|
device: "cpu" # cpu or cuda
|
|
num_threads: 4 # CPU threads (0 = auto)
|
|
optimization_level: "all" # none, basic, extended, all
|
|
```
|
|
|
|
#### Option B: RKNN NPU (For Rockchip devices)
|
|
|
|
```yaml
|
|
# Frigate-Mini Configuration - RKNN NPU Mode
|
|
# For Rockchip SBCs (RK3588, RK3568, etc.)
|
|
|
|
debug: true
|
|
log_level: "info"
|
|
|
|
detector:
|
|
type: "rknn" # Use RKNN Runtime
|
|
model_path: "models/yolov9t.rknn" # RKNN model file
|
|
input_size: [640, 640] # Model input resolution
|
|
conf_threshold: 0.25 # Detection confidence
|
|
nms_threshold: 0.45 # NMS threshold
|
|
|
|
# RKNN specific
|
|
rknn:
|
|
target_platform: "rk3588" # rk3588, rk3568, rk3566, etc.
|
|
core_mask: 7 # NPU core mask (7 = all 3 cores on RK3588)
|
|
|
|
# Fallback to ONNX if RKNN fails
|
|
fallback:
|
|
enabled: true
|
|
type: "onnx"
|
|
device: "cpu"
|
|
```
|
|
|
|
#### Option C: Ultralytics YOLO (For CUDA systems)
|
|
|
|
```yaml
|
|
# Frigate-Mini Configuration - Ultralytics YOLO Mode
|
|
# For systems with NVIDIA GPU
|
|
|
|
debug: true
|
|
log_level: "info"
|
|
|
|
detector:
|
|
type: "yolo" # Use Ultralytics
|
|
model_path: "models/yolov9t.pt" # PyTorch model file
|
|
conf_threshold: 0.25
|
|
nms_threshold: 0.45
|
|
|
|
# YOLO specific
|
|
yolo:
|
|
device: "cuda" # cpu, cuda, cuda:0, etc.
|
|
half: true # FP16 inference (faster on GPU)
|
|
```
|
|
|
|
#### Full Configuration Example (with all options)
|
|
|
|
# Video sources (cameras)
|
|
cameras:
|
|
front_door:
|
|
enabled: true
|
|
source: "input/front_door.mp4" # MP4 file path
|
|
fps: 5 # Processing FPS limit
|
|
loop: true # Loop video playback
|
|
|
|
# Detection zones (optional)
|
|
detect:
|
|
enabled: true
|
|
width: 1280 # Detection resolution
|
|
height: 720
|
|
|
|
# Object filtering
|
|
objects:
|
|
track:
|
|
- person
|
|
- car
|
|
- dog
|
|
filters:
|
|
person:
|
|
min_area: 1000 # Minimum area in pixels
|
|
max_area: 500000
|
|
min_score: 0.4
|
|
|
|
backyard:
|
|
enabled: true
|
|
source: "input/backyard.mp4"
|
|
fps: 5
|
|
loop: true
|
|
|
|
# Snapshot settings
|
|
snapshots:
|
|
enabled: true
|
|
output_dir: "output/snapshots"
|
|
|
|
# Trigger settings
|
|
trigger:
|
|
objects: # Objects that trigger snapshot
|
|
- person
|
|
- car
|
|
min_score: 0.5 # Minimum score to trigger
|
|
cooldown: 2.0 # Seconds between snapshots per object
|
|
|
|
# Output settings
|
|
format: "jpg" # jpg or png
|
|
quality: 95 # JPEG quality
|
|
clean: true # No annotations on snapshot
|
|
crop: false # Crop to object bbox
|
|
retain_days: 7 # Days to keep snapshots
|
|
|
|
# Annotation export
|
|
annotations:
|
|
enabled: true
|
|
output_dir: "output/labels"
|
|
format: "yolo" # YOLO format
|
|
|
|
# Pairing
|
|
pair_with_snapshots: true # Create snapshot-label pairs
|
|
|
|
# Filtering
|
|
min_score: 0.3
|
|
classes: null # null = all classes
|
|
|
|
# Debug settings
|
|
debug_output:
|
|
enabled: true
|
|
output_dir: "output/debug"
|
|
|
|
# Object list display
|
|
object_list:
|
|
enabled: true
|
|
show_confidence: true
|
|
show_class: true
|
|
show_bbox: true
|
|
|
|
# Visualization
|
|
visualization:
|
|
enabled: true
|
|
draw_boxes: true
|
|
draw_labels: true
|
|
draw_confidence: true
|
|
box_thickness: 2
|
|
font_scale: 0.5
|
|
|
|
# Statistics
|
|
stats:
|
|
show_fps: true
|
|
show_detection_count: true
|
|
log_interval: 100 # Log stats every N frames
|
|
|
|
# Class definitions
|
|
class_names:
|
|
0: person
|
|
1: bicycle
|
|
2: car
|
|
3: motorcycle
|
|
4: airplane
|
|
5: bus
|
|
6: train
|
|
7: truck
|
|
8: boat
|
|
9: traffic light
|
|
10: fire hydrant
|
|
# ... COCO classes continue
|
|
```
|
|
|
|
---
|
|
|
|
## Module Specifications
|
|
|
|
### 1. yolo_annotator/annotator.py
|
|
|
|
```python
|
|
class YOLOAnnotator:
|
|
"""YOLO-based automatic video annotator."""
|
|
|
|
def __init__(self, config_path: str):
|
|
"""Load configuration and initialize model."""
|
|
|
|
def load_model(self, model_path: str, device: str) -> None:
|
|
"""Load YOLOv9t model."""
|
|
|
|
def process_video(self, video_path: str) -> AnnotationResult:
|
|
"""Process entire video and generate annotations."""
|
|
|
|
def process_frame(self, frame: np.ndarray) -> List[Detection]:
|
|
"""Process single frame and return detections."""
|
|
|
|
def filter_detections(self, detections: List[Detection]) -> List[Detection]:
|
|
"""Apply filtering rules to detections."""
|
|
|
|
def export_annotations(self, output_dir: str) -> None:
|
|
"""Export all annotations to YOLO format."""
|
|
```
|
|
|
|
### 2. frigate_mini/detector/rknn_detector.py
|
|
|
|
```python
|
|
class RKNNDetector:
|
|
"""RKNN-based YOLO detector for Rockchip NPU."""
|
|
|
|
def __init__(self, model_path: str, target_platform: str):
|
|
"""Initialize RKNN runtime."""
|
|
|
|
def load_model(self) -> bool:
|
|
"""Load RKNN model to NPU."""
|
|
|
|
def preprocess(self, frame: np.ndarray) -> np.ndarray:
|
|
"""Preprocess frame for inference."""
|
|
|
|
def inference(self, input_data: np.ndarray) -> np.ndarray:
|
|
"""Run inference on NPU."""
|
|
|
|
def postprocess(self, outputs: np.ndarray) -> List[Detection]:
|
|
"""Parse YOLO outputs and apply NMS."""
|
|
|
|
def detect(self, frame: np.ndarray) -> List[Detection]:
|
|
"""Full detection pipeline."""
|
|
|
|
def release(self) -> None:
|
|
"""Release RKNN resources."""
|
|
```
|
|
|
|
### 3. frigate_mini/output/annotation.py
|
|
|
|
```python
|
|
class AnnotationWriter:
|
|
"""Write YOLO format annotation files."""
|
|
|
|
def __init__(self, output_dir: str, class_names: Dict[int, str]):
|
|
"""Initialize annotation writer."""
|
|
|
|
def write_label(self,
|
|
image_name: str,
|
|
detections: List[Detection],
|
|
image_size: Tuple[int, int]) -> str:
|
|
"""Write YOLO label file for image."""
|
|
|
|
def detection_to_yolo(self,
|
|
detection: Detection,
|
|
image_width: int,
|
|
image_height: int) -> str:
|
|
"""Convert detection to YOLO format string."""
|
|
|
|
def create_dataset_structure(self) -> None:
|
|
"""Create YOLO dataset directory structure."""
|
|
|
|
def write_data_yaml(self, train_path: str, val_path: str) -> str:
|
|
"""Generate data.yaml for training."""
|
|
```
|
|
|
|
### 4. frigate_mini/debug/object_list.py
|
|
|
|
```python
|
|
class ObjectListDisplay:
|
|
"""Display detected objects in debug mode."""
|
|
|
|
def __init__(self, config: Dict):
|
|
"""Initialize display settings."""
|
|
|
|
def update(self, detections: List[Detection]) -> None:
|
|
"""Update object list with new detections."""
|
|
|
|
def format_detection(self, detection: Detection) -> str:
|
|
"""Format single detection for display."""
|
|
|
|
def print_list(self) -> None:
|
|
"""Print current object list to console."""
|
|
|
|
def save_snapshot_with_labels(self,
|
|
frame: np.ndarray,
|
|
detections: List[Detection],
|
|
output_path: str) -> None:
|
|
"""Save debug image with annotations."""
|
|
```
|
|
|
|
---
|
|
|
|
## Data Structures
|
|
|
|
### Detection
|
|
|
|
```python
|
|
@dataclass
|
|
class Detection:
|
|
class_id: int # Class index
|
|
class_name: str # Class name
|
|
confidence: float # Detection confidence (0-1)
|
|
bbox: BBox # Bounding box
|
|
track_id: Optional[int] # Tracking ID (if tracked)
|
|
timestamp: float # Frame timestamp
|
|
frame_id: int # Frame number
|
|
|
|
@dataclass
|
|
class BBox:
|
|
x1: float # Top-left x (pixels)
|
|
y1: float # Top-left y (pixels)
|
|
x2: float # Bottom-right x (pixels)
|
|
y2: float # Bottom-right y (pixels)
|
|
|
|
def to_yolo(self, img_w: int, img_h: int) -> Tuple[float, float, float, float]:
|
|
"""Convert to YOLO format (x_center, y_center, width, height) normalized."""
|
|
|
|
def area(self) -> float:
|
|
"""Calculate bbox area in pixels."""
|
|
```
|
|
|
|
### AnnotationPair
|
|
|
|
```python
|
|
@dataclass
|
|
class AnnotationPair:
|
|
image_path: str # Path to snapshot image
|
|
label_path: str # Path to YOLO label file
|
|
detections: List[Detection]
|
|
timestamp: datetime
|
|
camera_name: str
|
|
frame_id: int
|
|
```
|
|
|
|
---
|
|
|
|
## Output Format
|
|
|
|
### Directory Structure
|
|
|
|
```
|
|
output/
|
|
├── snapshots/
|
|
│ ├── front_door/
|
|
│ │ ├── 20240115_143022_001.jpg
|
|
│ │ ├── 20240115_143025_002.jpg
|
|
│ │ └── ...
|
|
│ └── backyard/
|
|
│ └── ...
|
|
├── labels/
|
|
│ ├── front_door/
|
|
│ │ ├── 20240115_143022_001.txt
|
|
│ │ ├── 20240115_143025_002.txt
|
|
│ │ └── ...
|
|
│ └── backyard/
|
|
│ └── ...
|
|
├── debug/
|
|
│ ├── front_door/
|
|
│ │ ├── 20240115_143022_001_debug.jpg
|
|
│ │ └── ...
|
|
│ └── object_log.txt
|
|
└── manifest.json
|
|
```
|
|
|
|
### YOLO Label Format
|
|
|
|
```
|
|
# {class_id} {x_center} {y_center} {width} {height}
|
|
0 0.456789 0.321456 0.123456 0.234567
|
|
2 0.789012 0.654321 0.098765 0.176543
|
|
```
|
|
|
|
### Manifest JSON
|
|
|
|
```json
|
|
{
|
|
"created": "2024-01-15T14:30:22",
|
|
"model": "yolov9t.rknn",
|
|
"total_frames": 1500,
|
|
"total_detections": 3420,
|
|
"pairs": [
|
|
{
|
|
"image": "snapshots/front_door/20240115_143022_001.jpg",
|
|
"label": "labels/front_door/20240115_143022_001.txt",
|
|
"camera": "front_door",
|
|
"frame_id": 150,
|
|
"timestamp": "2024-01-15T14:30:22.500",
|
|
"detections": [
|
|
{"class": "person", "confidence": 0.87},
|
|
{"class": "car", "confidence": 0.92}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Core YOLO Annotator (Week 1)
|
|
- [ ] Create `yolo_annotator/` module structure
|
|
- [ ] Implement `YOLOAnnotator` class with Ultralytics backend
|
|
- [ ] Implement video source handling
|
|
- [ ] Implement YOLO label export
|
|
- [ ] Create `annotator.yaml` config loader
|
|
- [ ] Add CLI script `scripts/annotate.py`
|
|
- [ ] Test with sample video
|
|
|
|
### Phase 2: Frigate-Mini Base (Week 2)
|
|
- [ ] Create `frigate_mini/` module structure
|
|
- [ ] Implement config schema and loader
|
|
- [ ] Implement base detector interface
|
|
- [ ] Implement ONNX detector (for testing)
|
|
- [ ] Implement MP4 video source
|
|
- [ ] Implement basic frame processing loop
|
|
- [ ] Test basic detection pipeline
|
|
|
|
### Phase 3: RKNN Integration (Week 3)
|
|
- [ ] Implement RKNN detector backend
|
|
- [ ] Create ONNX to RKNN conversion script
|
|
- [ ] Test on Rockchip hardware (RK3588/RK3568)
|
|
- [ ] Optimize for NPU performance
|
|
- [ ] Add fallback mechanism
|
|
|
|
### Phase 4: Snapshot & Annotation System (Week 4)
|
|
- [ ] Implement snapshot capture system
|
|
- [ ] Implement annotation writer
|
|
- [ ] Implement snapshot-label pairing
|
|
- [ ] Add trigger-based capture logic
|
|
- [ ] Create manifest generator
|
|
|
|
### Phase 5: Debug System (Week 5)
|
|
- [ ] Implement object list display
|
|
- [ ] Implement debug visualization
|
|
- [ ] Add statistics tracking
|
|
- [ ] Create debug frame saver
|
|
- [ ] Add console and file logging
|
|
|
|
### Phase 6: Integration & Testing (Week 6)
|
|
- [ ] Integration testing
|
|
- [ ] Performance optimization
|
|
- [ ] Documentation
|
|
- [ ] Example configs for common use cases
|
|
- [ ] Package for distribution
|
|
|
|
---
|
|
|
|
## Dependencies
|
|
|
|
### New Requirements
|
|
|
|
```
|
|
# requirements.txt additions
|
|
|
|
# YOLO
|
|
ultralytics>=8.0.0
|
|
|
|
# RKNN (install separately based on platform)
|
|
# rknn-toolkit2 # For conversion (x86)
|
|
# rknnlite2 # For inference (ARM)
|
|
|
|
# Video processing
|
|
opencv-python>=4.8.0
|
|
av>=10.0.0 # PyAV for efficient video decoding
|
|
|
|
# Configuration
|
|
pyyaml>=6.0
|
|
pydantic>=2.0 # Config validation
|
|
|
|
# Utilities
|
|
tqdm>=4.65.0
|
|
numpy>=1.24.0
|
|
```
|
|
|
|
### RKNN Installation Notes
|
|
|
|
```bash
|
|
# On x86 host (for model conversion):
|
|
pip install rknn-toolkit2
|
|
|
|
# On Rockchip device (for inference):
|
|
pip install rknnlite2
|
|
|
|
# Or install from Rockchip GitHub releases
|
|
```
|
|
|
|
---
|
|
|
|
## Usage Examples
|
|
|
|
### 1. CPU-Only Workflow (ONNX) - Recommended for Development
|
|
|
|
```bash
|
|
# Step 1: Download pretrained YOLOv9t
|
|
wget https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9t.pt -O models/yolov9t.pt
|
|
|
|
# Step 2: Convert to ONNX
|
|
python scripts/convert_to_onnx.py --input models/yolov9t.pt --output models/yolov9t.onnx
|
|
|
|
# Step 3a: Auto-annotate video (CPU)
|
|
python scripts/annotate.py --config configs/annotator_cpu.yaml
|
|
# Or with CLI args:
|
|
python scripts/annotate.py \
|
|
--model models/yolov9t.onnx \
|
|
--video input/video.mp4 \
|
|
--device cpu
|
|
|
|
# Step 3b: Run Frigate-Mini (CPU)
|
|
python scripts/frigate_mini.py --config configs/frigate_mini_cpu.yaml
|
|
# Or with CLI args:
|
|
python scripts/frigate_mini.py \
|
|
--model models/yolov9t.onnx \
|
|
--video input/video.mp4 \
|
|
--output output/ \
|
|
--debug
|
|
```
|
|
|
|
### 2. RKNN Workflow (Rockchip NPU)
|
|
|
|
```bash
|
|
# Step 1: Convert ONNX to RKNN (on x86 host)
|
|
python scripts/convert_to_rknn.py \
|
|
--input models/yolov9t.onnx \
|
|
--output models/yolov9t.rknn \
|
|
--platform rk3588
|
|
|
|
# Step 2: Copy to Rockchip device and run
|
|
python scripts/frigate_mini.py --config configs/frigate_mini.yaml
|
|
# Or:
|
|
python scripts/frigate_mini.py \
|
|
--model models/yolov9t.rknn \
|
|
--video input/video.mp4 \
|
|
--platform rk3588
|
|
```
|
|
|
|
### 3. GPU Workflow (CUDA)
|
|
|
|
```bash
|
|
# Using Ultralytics directly with GPU
|
|
python scripts/annotate.py \
|
|
--model models/yolov9t.pt \
|
|
--video input/video.mp4 \
|
|
--device cuda
|
|
```
|
|
|
|
### Quick Reference
|
|
|
|
| Task | CPU (ONNX) | RKNN (NPU) | GPU (CUDA) |
|
|
|------|------------|------------|------------|
|
|
| Model file | `.onnx` | `.rknn` | `.pt` |
|
|
| Config | `*_cpu.yaml` | `frigate_mini.yaml` | Use `--device cuda` |
|
|
| Speed | 5-15 FPS | 30+ FPS | 50+ FPS |
|
|
| Hardware | Any CPU | Rockchip SBC | NVIDIA GPU |
|
|
|
|
---
|
|
|
|
## Future Enhancements
|
|
|
|
1. **RTSP Support** - Add real camera stream input
|
|
2. **Object Tracking** - Add ByteTrack/BoT-SORT for consistent IDs
|
|
3. **Web UI** - Simple web interface for monitoring
|
|
4. **Multi-model** - Support different models per camera
|
|
5. **Event System** - Webhooks for detection events
|
|
6. **Auto-labeling Refinement** - Use SAM2 to refine YOLO boxes
|
|
7. **Active Learning** - Flag low-confidence detections for review
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [Ultralytics YOLOv9](https://github.com/ultralytics/ultralytics)
|
|
- [RKNN-Toolkit2](https://github.com/rockchip-linux/rknn-toolkit2)
|
|
- [Frigate NVR](https://github.com/blakeblackshear/frigate)
|
|
- [YOLO Label Format](https://docs.ultralytics.com/datasets/detect/)
|