Files
dataset-yolo-script/sam2-cpu/PLAN.md
2026-02-04 15:29:36 +07:00

801 lines
23 KiB
Markdown

# Feature Plan: YOLO-Assisted Auto-Annotation + Mini Frigate RKNN
## Overview
Create two integrated components:
1. **YOLO-Assisted Annotator** - Use pretrained YOLOv9t to auto-annotate video frames
2. **Frigate-Mini-RKNN** - Standalone mini fork of Frigate for RKNN inference with MP4 input
## Goals
- Auto-annotate videos using YOLOv9t pretrained model (replaces manual SAM2 prompts)
- Minimal Frigate fork with multiple detector backends:
- **RKNN** - Rockchip NPU acceleration (RK3588, RK3568, etc.)
- **ONNX** - CPU-only inference (cross-platform, no special hardware)
- **YOLO** - Ultralytics backend (CPU/CUDA)
- MP4 file as camera feed source
- Output: Clean snapshot + YOLO format label pairs
- Simple text-based configuration
- Debug mode with object list visualization
## Detector Backends Comparison
| Backend | Hardware | Performance | Platform | Use Case |
|---------|----------|-------------|----------|----------|
| **RKNN** | Rockchip NPU | Fast (30+ FPS) | ARM (RK3588/3568) | Production on Rockchip SBC |
| **ONNX** | CPU | Medium (5-15 FPS) | Any (x86/ARM) | Development, testing, no GPU |
| **YOLO** | CPU/CUDA | Fast with GPU | Any | Development, CUDA systems |
### Recommended Workflow
1. **Development/Testing**: Use ONNX backend on any CPU
2. **Production on Rockchip**: Convert to RKNN, deploy on NPU
3. **Production on x86/CUDA**: Use YOLO backend with GPU
---
## Project Structure
```
sam2-yolo-pipeline/
├── notebooks/ # Existing Kaggle notebooks
├── utils/ # Existing utilities
├── yolo_annotator/ # NEW: YOLO-assisted annotation
│ ├── __init__.py
│ ├── annotator.py # Core YOLOv9t annotator
│ ├── video_source.py # MP4/RTSP video source handler
│ ├── export.py # Snapshot + label export
│ └── visualizer.py # Debug visualization
├── frigate_mini/ # NEW: Mini Frigate fork
│ ├── __init__.py
│ ├── app.py # Main application entry
│ ├── config/
│ │ ├── __init__.py
│ │ ├── schema.py # Config validation
│ │ └── loader.py # YAML config loader
│ ├── detector/
│ │ ├── __init__.py
│ │ ├── base.py # Base detector interface
│ │ ├── rknn_detector.py # RKNN backend
│ │ ├── onnx_detector.py # ONNX fallback
│ │ └── yolo_detector.py # Ultralytics YOLO fallback
│ ├── video/
│ │ ├── __init__.py
│ │ ├── mp4_source.py # MP4 file source
│ │ └── frame_processor.py # Frame processing pipeline
│ ├── output/
│ │ ├── __init__.py
│ │ ├── snapshot.py # Snapshot capture
│ │ └── annotation.py # YOLO label writer
│ └── debug/
│ ├── __init__.py
│ ├── object_list.py # Detected objects display
│ └── visualizer.py # Bounding box overlay
├── configs/ # NEW: Configuration files
│ ├── annotator.yaml # Annotator settings
│ └── frigate_mini.yaml # Frigate-mini settings
├── models/ # NEW: Model weights storage
│ └── .gitkeep
├── output/ # NEW: Default output directory
│ ├── snapshots/
│ ├── labels/
│ └── debug/
├── scripts/ # NEW: CLI scripts
│ ├── annotate.py # Run annotation pipeline
│ ├── frigate_mini.py # Run mini frigate
│ └── convert_to_rknn.py # Convert ONNX to RKNN
└── requirements.txt # Updated dependencies
```
---
## Component 1: YOLO-Assisted Annotator
### Purpose
Replace SAM2 auto-annotation with faster YOLOv9t-based detection for creating training datasets.
### Workflow
```
MP4 Video → Frame Extraction → YOLOv9t Detection → Filter/NMS → YOLO Labels + Snapshots
```
### Features
1. **Model Loading**
- Load pretrained YOLOv9t (.pt file)
- Support custom trained models
- Configurable confidence threshold
- Configurable NMS threshold
2. **Video Processing**
- MP4 file input
- Configurable FPS sampling
- Frame skip / time range selection
- Resolution scaling
3. **Detection Filtering**
- Filter by class IDs
- Filter by confidence score
- Filter by bbox size (min/max area)
- Filter by aspect ratio
4. **Output Generation**
- Clean snapshot images (no annotations drawn)
- YOLO format label files (.txt)
- Optional debug images with boxes drawn
- JSON manifest of all detections
### Configuration (annotator.yaml)
```yaml
# YOLO-Assisted Annotator Configuration
model:
path: "models/yolov9t.pt" # Path to YOLO model
device: "cuda" # cuda, cpu, or rknn
conf_threshold: 0.25 # Confidence threshold
iou_threshold: 0.45 # NMS IoU threshold
video:
source: "input/video.mp4" # Video file path
sample_fps: 2 # Frames per second to extract
max_frames: null # Max frames (null = all)
start_time: 0 # Start time in seconds
end_time: null # End time (null = end of video)
resize: null # [width, height] or null
detection:
classes: null # Class IDs to keep (null = all)
min_confidence: 0.3 # Minimum confidence to save
min_area: 100 # Minimum bbox area in pixels
max_area: null # Maximum bbox area (null = no limit)
min_size: 0.01 # Minimum bbox dimension (normalized)
output:
directory: "output/annotations" # Output directory
save_snapshots: true # Save clean images
save_labels: true # Save YOLO labels
save_debug: true # Save debug visualizations
save_manifest: true # Save JSON manifest
image_format: "jpg" # jpg or png
image_quality: 95 # JPEG quality (1-100)
classes:
# Class name mapping (for display/filtering)
0: "person"
1: "bicycle"
2: "car"
# ... etc
```
---
## Component 2: Frigate-Mini-RKNN
### Purpose
Minimal standalone Frigate-like system for RKNN inference on Rockchip devices, outputting annotation pairs.
### Workflow
```
MP4 Feed → Frame Decode → RKNN Inference → Object Tracking → Snapshot + Label Export
```
### Features
1. **Video Input**
- MP4 file as "camera" source
- Loop playback option
- Configurable FPS limit
- Multiple video sources support
2. **RKNN Detector**
- Load RKNN model (.rknn file)
- NPU acceleration on Rockchip SoCs
- Fallback to ONNX/CPU if RKNN unavailable
- Batch inference support
3. **Object Detection**
- YOLOv9t architecture support
- Configurable input resolution
- Post-processing (NMS, filtering)
- Class filtering
4. **Snapshot System**
- Capture on detection trigger
- Configurable cooldown period
- Clean snapshots (no overlays)
- Crop to detected object (optional)
5. **Annotation Export**
- YOLO format labels
- Synchronized snapshot-label pairs
- Auto-naming with timestamps
- Dataset structure output
6. **Debug Mode**
- Real-time object list display
- Bounding box visualization
- FPS counter
- Detection statistics
- Save debug frames
### Configuration (frigate_mini.yaml)
#### Option A: ONNX CPU-Only (Recommended for development/testing)
```yaml
# Frigate-Mini Configuration - ONNX CPU Mode
# Works on any system without special hardware
debug: true
log_level: "info"
detector:
type: "onnx" # Use ONNX Runtime
model_path: "models/yolov9t.onnx" # ONNX model file
input_size: [640, 640] # Model input resolution
conf_threshold: 0.25 # Detection confidence
nms_threshold: 0.45 # NMS threshold
# ONNX specific settings
onnx:
device: "cpu" # cpu or cuda
num_threads: 4 # CPU threads (0 = auto)
optimization_level: "all" # none, basic, extended, all
```
#### Option B: RKNN NPU (For Rockchip devices)
```yaml
# Frigate-Mini Configuration - RKNN NPU Mode
# For Rockchip SBCs (RK3588, RK3568, etc.)
debug: true
log_level: "info"
detector:
type: "rknn" # Use RKNN Runtime
model_path: "models/yolov9t.rknn" # RKNN model file
input_size: [640, 640] # Model input resolution
conf_threshold: 0.25 # Detection confidence
nms_threshold: 0.45 # NMS threshold
# RKNN specific
rknn:
target_platform: "rk3588" # rk3588, rk3568, rk3566, etc.
core_mask: 7 # NPU core mask (7 = all 3 cores on RK3588)
# Fallback to ONNX if RKNN fails
fallback:
enabled: true
type: "onnx"
device: "cpu"
```
#### Option C: Ultralytics YOLO (For CUDA systems)
```yaml
# Frigate-Mini Configuration - Ultralytics YOLO Mode
# For systems with NVIDIA GPU
debug: true
log_level: "info"
detector:
type: "yolo" # Use Ultralytics
model_path: "models/yolov9t.pt" # PyTorch model file
conf_threshold: 0.25
nms_threshold: 0.45
# YOLO specific
yolo:
device: "cuda" # cpu, cuda, cuda:0, etc.
half: true # FP16 inference (faster on GPU)
```
#### Full Configuration Example (with all options)
# Video sources (cameras)
cameras:
front_door:
enabled: true
source: "input/front_door.mp4" # MP4 file path
fps: 5 # Processing FPS limit
loop: true # Loop video playback
# Detection zones (optional)
detect:
enabled: true
width: 1280 # Detection resolution
height: 720
# Object filtering
objects:
track:
- person
- car
- dog
filters:
person:
min_area: 1000 # Minimum area in pixels
max_area: 500000
min_score: 0.4
backyard:
enabled: true
source: "input/backyard.mp4"
fps: 5
loop: true
# Snapshot settings
snapshots:
enabled: true
output_dir: "output/snapshots"
# Trigger settings
trigger:
objects: # Objects that trigger snapshot
- person
- car
min_score: 0.5 # Minimum score to trigger
cooldown: 2.0 # Seconds between snapshots per object
# Output settings
format: "jpg" # jpg or png
quality: 95 # JPEG quality
clean: true # No annotations on snapshot
crop: false # Crop to object bbox
retain_days: 7 # Days to keep snapshots
# Annotation export
annotations:
enabled: true
output_dir: "output/labels"
format: "yolo" # YOLO format
# Pairing
pair_with_snapshots: true # Create snapshot-label pairs
# Filtering
min_score: 0.3
classes: null # null = all classes
# Debug settings
debug_output:
enabled: true
output_dir: "output/debug"
# Object list display
object_list:
enabled: true
show_confidence: true
show_class: true
show_bbox: true
# Visualization
visualization:
enabled: true
draw_boxes: true
draw_labels: true
draw_confidence: true
box_thickness: 2
font_scale: 0.5
# Statistics
stats:
show_fps: true
show_detection_count: true
log_interval: 100 # Log stats every N frames
# Class definitions
class_names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
# ... COCO classes continue
```
---
## Module Specifications
### 1. yolo_annotator/annotator.py
```python
class YOLOAnnotator:
"""YOLO-based automatic video annotator."""
def __init__(self, config_path: str):
"""Load configuration and initialize model."""
def load_model(self, model_path: str, device: str) -> None:
"""Load YOLOv9t model."""
def process_video(self, video_path: str) -> AnnotationResult:
"""Process entire video and generate annotations."""
def process_frame(self, frame: np.ndarray) -> List[Detection]:
"""Process single frame and return detections."""
def filter_detections(self, detections: List[Detection]) -> List[Detection]:
"""Apply filtering rules to detections."""
def export_annotations(self, output_dir: str) -> None:
"""Export all annotations to YOLO format."""
```
### 2. frigate_mini/detector/rknn_detector.py
```python
class RKNNDetector:
"""RKNN-based YOLO detector for Rockchip NPU."""
def __init__(self, model_path: str, target_platform: str):
"""Initialize RKNN runtime."""
def load_model(self) -> bool:
"""Load RKNN model to NPU."""
def preprocess(self, frame: np.ndarray) -> np.ndarray:
"""Preprocess frame for inference."""
def inference(self, input_data: np.ndarray) -> np.ndarray:
"""Run inference on NPU."""
def postprocess(self, outputs: np.ndarray) -> List[Detection]:
"""Parse YOLO outputs and apply NMS."""
def detect(self, frame: np.ndarray) -> List[Detection]:
"""Full detection pipeline."""
def release(self) -> None:
"""Release RKNN resources."""
```
### 3. frigate_mini/output/annotation.py
```python
class AnnotationWriter:
"""Write YOLO format annotation files."""
def __init__(self, output_dir: str, class_names: Dict[int, str]):
"""Initialize annotation writer."""
def write_label(self,
image_name: str,
detections: List[Detection],
image_size: Tuple[int, int]) -> str:
"""Write YOLO label file for image."""
def detection_to_yolo(self,
detection: Detection,
image_width: int,
image_height: int) -> str:
"""Convert detection to YOLO format string."""
def create_dataset_structure(self) -> None:
"""Create YOLO dataset directory structure."""
def write_data_yaml(self, train_path: str, val_path: str) -> str:
"""Generate data.yaml for training."""
```
### 4. frigate_mini/debug/object_list.py
```python
class ObjectListDisplay:
"""Display detected objects in debug mode."""
def __init__(self, config: Dict):
"""Initialize display settings."""
def update(self, detections: List[Detection]) -> None:
"""Update object list with new detections."""
def format_detection(self, detection: Detection) -> str:
"""Format single detection for display."""
def print_list(self) -> None:
"""Print current object list to console."""
def save_snapshot_with_labels(self,
frame: np.ndarray,
detections: List[Detection],
output_path: str) -> None:
"""Save debug image with annotations."""
```
---
## Data Structures
### Detection
```python
@dataclass
class Detection:
class_id: int # Class index
class_name: str # Class name
confidence: float # Detection confidence (0-1)
bbox: BBox # Bounding box
track_id: Optional[int] # Tracking ID (if tracked)
timestamp: float # Frame timestamp
frame_id: int # Frame number
@dataclass
class BBox:
x1: float # Top-left x (pixels)
y1: float # Top-left y (pixels)
x2: float # Bottom-right x (pixels)
y2: float # Bottom-right y (pixels)
def to_yolo(self, img_w: int, img_h: int) -> Tuple[float, float, float, float]:
"""Convert to YOLO format (x_center, y_center, width, height) normalized."""
def area(self) -> float:
"""Calculate bbox area in pixels."""
```
### AnnotationPair
```python
@dataclass
class AnnotationPair:
image_path: str # Path to snapshot image
label_path: str # Path to YOLO label file
detections: List[Detection]
timestamp: datetime
camera_name: str
frame_id: int
```
---
## Output Format
### Directory Structure
```
output/
├── snapshots/
│ ├── front_door/
│ │ ├── 20240115_143022_001.jpg
│ │ ├── 20240115_143025_002.jpg
│ │ └── ...
│ └── backyard/
│ └── ...
├── labels/
│ ├── front_door/
│ │ ├── 20240115_143022_001.txt
│ │ ├── 20240115_143025_002.txt
│ │ └── ...
│ └── backyard/
│ └── ...
├── debug/
│ ├── front_door/
│ │ ├── 20240115_143022_001_debug.jpg
│ │ └── ...
│ └── object_log.txt
└── manifest.json
```
### YOLO Label Format
```
# {class_id} {x_center} {y_center} {width} {height}
0 0.456789 0.321456 0.123456 0.234567
2 0.789012 0.654321 0.098765 0.176543
```
### Manifest JSON
```json
{
"created": "2024-01-15T14:30:22",
"model": "yolov9t.rknn",
"total_frames": 1500,
"total_detections": 3420,
"pairs": [
{
"image": "snapshots/front_door/20240115_143022_001.jpg",
"label": "labels/front_door/20240115_143022_001.txt",
"camera": "front_door",
"frame_id": 150,
"timestamp": "2024-01-15T14:30:22.500",
"detections": [
{"class": "person", "confidence": 0.87},
{"class": "car", "confidence": 0.92}
]
}
]
}
```
---
## Implementation Phases
### Phase 1: Core YOLO Annotator (Week 1)
- [ ] Create `yolo_annotator/` module structure
- [ ] Implement `YOLOAnnotator` class with Ultralytics backend
- [ ] Implement video source handling
- [ ] Implement YOLO label export
- [ ] Create `annotator.yaml` config loader
- [ ] Add CLI script `scripts/annotate.py`
- [ ] Test with sample video
### Phase 2: Frigate-Mini Base (Week 2)
- [ ] Create `frigate_mini/` module structure
- [ ] Implement config schema and loader
- [ ] Implement base detector interface
- [ ] Implement ONNX detector (for testing)
- [ ] Implement MP4 video source
- [ ] Implement basic frame processing loop
- [ ] Test basic detection pipeline
### Phase 3: RKNN Integration (Week 3)
- [ ] Implement RKNN detector backend
- [ ] Create ONNX to RKNN conversion script
- [ ] Test on Rockchip hardware (RK3588/RK3568)
- [ ] Optimize for NPU performance
- [ ] Add fallback mechanism
### Phase 4: Snapshot & Annotation System (Week 4)
- [ ] Implement snapshot capture system
- [ ] Implement annotation writer
- [ ] Implement snapshot-label pairing
- [ ] Add trigger-based capture logic
- [ ] Create manifest generator
### Phase 5: Debug System (Week 5)
- [ ] Implement object list display
- [ ] Implement debug visualization
- [ ] Add statistics tracking
- [ ] Create debug frame saver
- [ ] Add console and file logging
### Phase 6: Integration & Testing (Week 6)
- [ ] Integration testing
- [ ] Performance optimization
- [ ] Documentation
- [ ] Example configs for common use cases
- [ ] Package for distribution
---
## Dependencies
### New Requirements
```
# requirements.txt additions
# YOLO
ultralytics>=8.0.0
# RKNN (install separately based on platform)
# rknn-toolkit2 # For conversion (x86)
# rknnlite2 # For inference (ARM)
# Video processing
opencv-python>=4.8.0
av>=10.0.0 # PyAV for efficient video decoding
# Configuration
pyyaml>=6.0
pydantic>=2.0 # Config validation
# Utilities
tqdm>=4.65.0
numpy>=1.24.0
```
### RKNN Installation Notes
```bash
# On x86 host (for model conversion):
pip install rknn-toolkit2
# On Rockchip device (for inference):
pip install rknnlite2
# Or install from Rockchip GitHub releases
```
---
## Usage Examples
### 1. CPU-Only Workflow (ONNX) - Recommended for Development
```bash
# Step 1: Download pretrained YOLOv9t
wget https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9t.pt -O models/yolov9t.pt
# Step 2: Convert to ONNX
python scripts/convert_to_onnx.py --input models/yolov9t.pt --output models/yolov9t.onnx
# Step 3a: Auto-annotate video (CPU)
python scripts/annotate.py --config configs/annotator_cpu.yaml
# Or with CLI args:
python scripts/annotate.py \
--model models/yolov9t.onnx \
--video input/video.mp4 \
--device cpu
# Step 3b: Run Frigate-Mini (CPU)
python scripts/frigate_mini.py --config configs/frigate_mini_cpu.yaml
# Or with CLI args:
python scripts/frigate_mini.py \
--model models/yolov9t.onnx \
--video input/video.mp4 \
--output output/ \
--debug
```
### 2. RKNN Workflow (Rockchip NPU)
```bash
# Step 1: Convert ONNX to RKNN (on x86 host)
python scripts/convert_to_rknn.py \
--input models/yolov9t.onnx \
--output models/yolov9t.rknn \
--platform rk3588
# Step 2: Copy to Rockchip device and run
python scripts/frigate_mini.py --config configs/frigate_mini.yaml
# Or:
python scripts/frigate_mini.py \
--model models/yolov9t.rknn \
--video input/video.mp4 \
--platform rk3588
```
### 3. GPU Workflow (CUDA)
```bash
# Using Ultralytics directly with GPU
python scripts/annotate.py \
--model models/yolov9t.pt \
--video input/video.mp4 \
--device cuda
```
### Quick Reference
| Task | CPU (ONNX) | RKNN (NPU) | GPU (CUDA) |
|------|------------|------------|------------|
| Model file | `.onnx` | `.rknn` | `.pt` |
| Config | `*_cpu.yaml` | `frigate_mini.yaml` | Use `--device cuda` |
| Speed | 5-15 FPS | 30+ FPS | 50+ FPS |
| Hardware | Any CPU | Rockchip SBC | NVIDIA GPU |
---
## Future Enhancements
1. **RTSP Support** - Add real camera stream input
2. **Object Tracking** - Add ByteTrack/BoT-SORT for consistent IDs
3. **Web UI** - Simple web interface for monitoring
4. **Multi-model** - Support different models per camera
5. **Event System** - Webhooks for detection events
6. **Auto-labeling Refinement** - Use SAM2 to refine YOLO boxes
7. **Active Learning** - Flag low-confidence detections for review
---
## References
- [Ultralytics YOLOv9](https://github.com/ultralytics/ultralytics)
- [RKNN-Toolkit2](https://github.com/rockchip-linux/rknn-toolkit2)
- [Frigate NVR](https://github.com/blakeblackshear/frigate)
- [YOLO Label Format](https://docs.ultralytics.com/datasets/detect/)