dataset-yolo-script/sam2-cpu/PLAN.md

# Feature Plan: YOLO-Assisted Auto-Annotation + Mini Frigate RKNN

## Overview

Create two integrated components:
1. **YOLO-Assisted Annotator** - Use pretrained YOLOv9t to auto-annotate video frames
2. **Frigate-Mini-RKNN** - Standalone mini fork of Frigate for RKNN inference with MP4 input

## Goals

- Auto-annotate videos using YOLOv9t pretrained model (replaces manual SAM2 prompts)
- Minimal Frigate fork with multiple detector backends:
  - **RKNN** - Rockchip NPU acceleration (RK3588, RK3568, etc.)
  - **ONNX** - CPU-only inference (cross-platform, no special hardware)
  - **YOLO** - Ultralytics backend (CPU/CUDA)
- MP4 file as camera feed source
- Output: Clean snapshot + YOLO format label pairs
- Simple text-based configuration
- Debug mode with object list visualization

## Detector Backends Comparison

| Backend | Hardware | Performance | Platform | Use Case |
|---------|----------|-------------|----------|----------|
| **RKNN** | Rockchip NPU | Fast (30+ FPS) | ARM (RK3588/3568) | Production on Rockchip SBC |
| **ONNX** | CPU | Medium (5-15 FPS) | Any (x86/ARM) | Development, testing, no GPU |
| **YOLO** | CPU/CUDA | Fast with GPU | Any | Development, CUDA systems |

### Recommended Workflow

1. **Development/Testing**: Use ONNX backend on any CPU
2. **Production on Rockchip**: Convert to RKNN, deploy on NPU
3. **Production on x86/CUDA**: Use YOLO backend with GPU

---

## Project Structure

```
sam2-yolo-pipeline/
├── notebooks/                      # Existing Kaggle notebooks
├── utils/                          # Existing utilities
├── yolo_annotator/                 # NEW: YOLO-assisted annotation
│   ├── __init__.py
│   ├── annotator.py               # Core YOLOv9t annotator
│   ├── video_source.py            # MP4/RTSP video source handler
│   ├── export.py                  # Snapshot + label export
│   └── visualizer.py              # Debug visualization
├── frigate_mini/                   # NEW: Mini Frigate fork
│   ├── __init__.py
│   ├── app.py                     # Main application entry
│   ├── config/
│   │   ├── __init__.py
│   │   ├── schema.py              # Config validation
│   │   └── loader.py              # YAML config loader
│   ├── detector/
│   │   ├── __init__.py
│   │   ├── base.py                # Base detector interface
│   │   ├── rknn_detector.py       # RKNN backend
│   │   ├── onnx_detector.py       # ONNX fallback
│   │   └── yolo_detector.py       # Ultralytics YOLO fallback
│   ├── video/
│   │   ├── __init__.py
│   │   ├── mp4_source.py          # MP4 file source
│   │   └── frame_processor.py     # Frame processing pipeline
│   ├── output/
│   │   ├── __init__.py
│   │   ├── snapshot.py            # Snapshot capture
│   │   └── annotation.py          # YOLO label writer
│   └── debug/
│       ├── __init__.py
│       ├── object_list.py         # Detected objects display
│       └── visualizer.py          # Bounding box overlay
├── configs/                        # NEW: Configuration files
│   ├── annotator.yaml             # Annotator settings
│   └── frigate_mini.yaml          # Frigate-mini settings
├── models/                         # NEW: Model weights storage
│   └── .gitkeep
├── output/                         # NEW: Default output directory
│   ├── snapshots/
│   ├── labels/
│   └── debug/
├── scripts/                        # NEW: CLI scripts
│   ├── annotate.py                # Run annotation pipeline
│   ├── frigate_mini.py            # Run mini frigate
│   └── convert_to_rknn.py         # Convert ONNX to RKNN
└── requirements.txt               # Updated dependencies
```

---

## Component 1: YOLO-Assisted Annotator

### Purpose
Replace SAM2 auto-annotation with faster YOLOv9t-based detection for creating training datasets.

### Workflow
```
MP4 Video → Frame Extraction → YOLOv9t Detection → Filter/NMS → YOLO Labels + Snapshots
```

### Features

1. **Model Loading**
   - Load pretrained YOLOv9t (.pt file)
   - Support custom trained models
   - Configurable confidence threshold
   - Configurable NMS threshold

2. **Video Processing**
   - MP4 file input
   - Configurable FPS sampling
   - Frame skip / time range selection
   - Resolution scaling

3. **Detection Filtering**
   - Filter by class IDs
   - Filter by confidence score
   - Filter by bbox size (min/max area)
   - Filter by aspect ratio

4. **Output Generation**
   - Clean snapshot images (no annotations drawn)
   - YOLO format label files (.txt)
   - Optional debug images with boxes drawn
   - JSON manifest of all detections

### Configuration (annotator.yaml)

```yaml
# YOLO-Assisted Annotator Configuration

model:
  path: "models/yolov9t.pt"          # Path to YOLO model
  device: "cuda"                      # cuda, cpu, or rknn
  conf_threshold: 0.25                # Confidence threshold
  iou_threshold: 0.45                 # NMS IoU threshold

video:
  source: "input/video.mp4"           # Video file path
  sample_fps: 2                       # Frames per second to extract
  max_frames: null                    # Max frames (null = all)
  start_time: 0                       # Start time in seconds
  end_time: null                      # End time (null = end of video)
  resize: null                        # [width, height] or null

detection:
  classes: null                       # Class IDs to keep (null = all)
  min_confidence: 0.3                 # Minimum confidence to save
  min_area: 100                       # Minimum bbox area in pixels
  max_area: null                      # Maximum bbox area (null = no limit)
  min_size: 0.01                      # Minimum bbox dimension (normalized)

output:
  directory: "output/annotations"     # Output directory
  save_snapshots: true                # Save clean images
  save_labels: true                   # Save YOLO labels
  save_debug: true                    # Save debug visualizations
  save_manifest: true                 # Save JSON manifest
  image_format: "jpg"                 # jpg or png
  image_quality: 95                   # JPEG quality (1-100)

classes:
  # Class name mapping (for display/filtering)
  0: "person"
  1: "bicycle"
  2: "car"
  # ... etc
```

---

## Component 2: Frigate-Mini-RKNN

### Purpose
Minimal standalone Frigate-like system for RKNN inference on Rockchip devices, outputting annotation pairs.

### Workflow
```
MP4 Feed → Frame Decode → RKNN Inference → Object Tracking → Snapshot + Label Export
```

### Features

1. **Video Input**
   - MP4 file as "camera" source
   - Loop playback option
   - Configurable FPS limit
   - Multiple video sources support

2. **RKNN Detector**
   - Load RKNN model (.rknn file)
   - NPU acceleration on Rockchip SoCs
   - Fallback to ONNX/CPU if RKNN unavailable
   - Batch inference support

3. **Object Detection**
   - YOLOv9t architecture support
   - Configurable input resolution
   - Post-processing (NMS, filtering)
   - Class filtering

4. **Snapshot System**
   - Capture on detection trigger
   - Configurable cooldown period
   - Clean snapshots (no overlays)
   - Crop to detected object (optional)

5. **Annotation Export**
   - YOLO format labels
   - Synchronized snapshot-label pairs
   - Auto-naming with timestamps
   - Dataset structure output

6. **Debug Mode**
   - Real-time object list display
   - Bounding box visualization
   - FPS counter
   - Detection statistics
   - Save debug frames

### Configuration (frigate_mini.yaml)

#### Option A: ONNX CPU-Only (Recommended for development/testing)

```yaml
# Frigate-Mini Configuration - ONNX CPU Mode
# Works on any system without special hardware

debug: true
log_level: "info"

detector:
  type: "onnx"                        # Use ONNX Runtime
  model_path: "models/yolov9t.onnx"   # ONNX model file
  input_size: [640, 640]              # Model input resolution
  conf_threshold: 0.25                # Detection confidence
  nms_threshold: 0.45                 # NMS threshold

  # ONNX specific settings
  onnx:
    device: "cpu"                     # cpu or cuda
    num_threads: 4                    # CPU threads (0 = auto)
    optimization_level: "all"         # none, basic, extended, all
```

#### Option B: RKNN NPU (For Rockchip devices)

```yaml
# Frigate-Mini Configuration - RKNN NPU Mode
# For Rockchip SBCs (RK3588, RK3568, etc.)

debug: true
log_level: "info"

detector:
  type: "rknn"                        # Use RKNN Runtime
  model_path: "models/yolov9t.rknn"   # RKNN model file
  input_size: [640, 640]              # Model input resolution
  conf_threshold: 0.25                # Detection confidence
  nms_threshold: 0.45                 # NMS threshold

  # RKNN specific
  rknn:
    target_platform: "rk3588"         # rk3588, rk3568, rk3566, etc.
    core_mask: 7                      # NPU core mask (7 = all 3 cores on RK3588)

  # Fallback to ONNX if RKNN fails
  fallback:
    enabled: true
    type: "onnx"
    device: "cpu"
```

#### Option C: Ultralytics YOLO (For CUDA systems)

```yaml
# Frigate-Mini Configuration - Ultralytics YOLO Mode
# For systems with NVIDIA GPU

debug: true
log_level: "info"

detector:
  type: "yolo"                        # Use Ultralytics
  model_path: "models/yolov9t.pt"     # PyTorch model file
  conf_threshold: 0.25
  nms_threshold: 0.45

  # YOLO specific
  yolo:
    device: "cuda"                    # cpu, cuda, cuda:0, etc.
    half: true                        # FP16 inference (faster on GPU)
```

#### Full Configuration Example (with all options)

# Video sources (cameras)
cameras:
  front_door:
    enabled: true
    source: "input/front_door.mp4"    # MP4 file path
    fps: 5                            # Processing FPS limit
    loop: true                        # Loop video playback

    # Detection zones (optional)
    detect:
      enabled: true
      width: 1280                     # Detection resolution
      height: 720

    # Object filtering
    objects:
      track:
        - person
        - car
        - dog
      filters:
        person:
          min_area: 1000              # Minimum area in pixels
          max_area: 500000
          min_score: 0.4

  backyard:
    enabled: true
    source: "input/backyard.mp4"
    fps: 5
    loop: true

# Snapshot settings
snapshots:
  enabled: true
  output_dir: "output/snapshots"

  # Trigger settings
  trigger:
    objects:                          # Objects that trigger snapshot
      - person
      - car
    min_score: 0.5                    # Minimum score to trigger
    cooldown: 2.0                     # Seconds between snapshots per object

  # Output settings
  format: "jpg"                       # jpg or png
  quality: 95                         # JPEG quality
  clean: true                         # No annotations on snapshot
  crop: false                         # Crop to object bbox
  retain_days: 7                      # Days to keep snapshots

# Annotation export
annotations:
  enabled: true
  output_dir: "output/labels"
  format: "yolo"                      # YOLO format

  # Pairing
  pair_with_snapshots: true           # Create snapshot-label pairs

  # Filtering
  min_score: 0.3
  classes: null                       # null = all classes

# Debug settings
debug_output:
  enabled: true
  output_dir: "output/debug"

  # Object list display
  object_list:
    enabled: true
    show_confidence: true
    show_class: true
    show_bbox: true

  # Visualization
  visualization:
    enabled: true
    draw_boxes: true
    draw_labels: true
    draw_confidence: true
    box_thickness: 2
    font_scale: 0.5

  # Statistics
  stats:
    show_fps: true
    show_detection_count: true
    log_interval: 100                 # Log stats every N frames

# Class definitions
class_names:
  0: person
  1: bicycle
  2: car
  3: motorcycle
  4: airplane
  5: bus
  6: train
  7: truck
  8: boat
  9: traffic light
  10: fire hydrant
  # ... COCO classes continue
```

---

## Module Specifications

### 1. yolo_annotator/annotator.py

```python
class YOLOAnnotator:
    """YOLO-based automatic video annotator."""

    def __init__(self, config_path: str):
        """Load configuration and initialize model."""

    def load_model(self, model_path: str, device: str) -> None:
        """Load YOLOv9t model."""

    def process_video(self, video_path: str) -> AnnotationResult:
        """Process entire video and generate annotations."""

    def process_frame(self, frame: np.ndarray) -> List[Detection]:
        """Process single frame and return detections."""

    def filter_detections(self, detections: List[Detection]) -> List[Detection]:
        """Apply filtering rules to detections."""

    def export_annotations(self, output_dir: str) -> None:
        """Export all annotations to YOLO format."""
```

### 2. frigate_mini/detector/rknn_detector.py

```python
class RKNNDetector:
    """RKNN-based YOLO detector for Rockchip NPU."""

    def __init__(self, model_path: str, target_platform: str):
        """Initialize RKNN runtime."""

    def load_model(self) -> bool:
        """Load RKNN model to NPU."""

    def preprocess(self, frame: np.ndarray) -> np.ndarray:
        """Preprocess frame for inference."""

    def inference(self, input_data: np.ndarray) -> np.ndarray:
        """Run inference on NPU."""

    def postprocess(self, outputs: np.ndarray) -> List[Detection]:
        """Parse YOLO outputs and apply NMS."""

    def detect(self, frame: np.ndarray) -> List[Detection]:
        """Full detection pipeline."""

    def release(self) -> None:
        """Release RKNN resources."""
```

### 3. frigate_mini/output/annotation.py

```python
class AnnotationWriter:
    """Write YOLO format annotation files."""

    def __init__(self, output_dir: str, class_names: Dict[int, str]):
        """Initialize annotation writer."""

    def write_label(self,
                    image_name: str,
                    detections: List[Detection],
                    image_size: Tuple[int, int]) -> str:
        """Write YOLO label file for image."""

    def detection_to_yolo(self,
                          detection: Detection,
                          image_width: int,
                          image_height: int) -> str:
        """Convert detection to YOLO format string."""

    def create_dataset_structure(self) -> None:
        """Create YOLO dataset directory structure."""

    def write_data_yaml(self, train_path: str, val_path: str) -> str:
        """Generate data.yaml for training."""
```

### 4. frigate_mini/debug/object_list.py

```python
class ObjectListDisplay:
    """Display detected objects in debug mode."""

    def __init__(self, config: Dict):
        """Initialize display settings."""

    def update(self, detections: List[Detection]) -> None:
        """Update object list with new detections."""

    def format_detection(self, detection: Detection) -> str:
        """Format single detection for display."""

    def print_list(self) -> None:
        """Print current object list to console."""

    def save_snapshot_with_labels(self,
                                   frame: np.ndarray,
                                   detections: List[Detection],
                                   output_path: str) -> None:
        """Save debug image with annotations."""
```

---

## Data Structures

### Detection

```python
@dataclass
class Detection:
    class_id: int           # Class index
    class_name: str         # Class name
    confidence: float       # Detection confidence (0-1)
    bbox: BBox              # Bounding box
    track_id: Optional[int] # Tracking ID (if tracked)
    timestamp: float        # Frame timestamp
    frame_id: int           # Frame number

@dataclass
class BBox:
    x1: float               # Top-left x (pixels)
    y1: float               # Top-left y (pixels)
    x2: float               # Bottom-right x (pixels)
    y2: float               # Bottom-right y (pixels)

    def to_yolo(self, img_w: int, img_h: int) -> Tuple[float, float, float, float]:
        """Convert to YOLO format (x_center, y_center, width, height) normalized."""

    def area(self) -> float:
        """Calculate bbox area in pixels."""
```

### AnnotationPair

```python
@dataclass
class AnnotationPair:
    image_path: str          # Path to snapshot image
    label_path: str          # Path to YOLO label file
    detections: List[Detection]
    timestamp: datetime
    camera_name: str
    frame_id: int
```

---

## Output Format

### Directory Structure

```
output/
├── snapshots/
│   ├── front_door/
│   │   ├── 20240115_143022_001.jpg
│   │   ├── 20240115_143025_002.jpg
│   │   └── ...
│   └── backyard/
│       └── ...
├── labels/
│   ├── front_door/
│   │   ├── 20240115_143022_001.txt
│   │   ├── 20240115_143025_002.txt
│   │   └── ...
│   └── backyard/
│       └── ...
├── debug/
│   ├── front_door/
│   │   ├── 20240115_143022_001_debug.jpg
│   │   └── ...
│   └── object_log.txt
└── manifest.json
```

### YOLO Label Format

```
# {class_id} {x_center} {y_center} {width} {height}
0 0.456789 0.321456 0.123456 0.234567
2 0.789012 0.654321 0.098765 0.176543
```

### Manifest JSON

```json
{
  "created": "2024-01-15T14:30:22",
  "model": "yolov9t.rknn",
  "total_frames": 1500,
  "total_detections": 3420,
  "pairs": [
    {
      "image": "snapshots/front_door/20240115_143022_001.jpg",
      "label": "labels/front_door/20240115_143022_001.txt",
      "camera": "front_door",
      "frame_id": 150,
      "timestamp": "2024-01-15T14:30:22.500",
      "detections": [
        {"class": "person", "confidence": 0.87},
        {"class": "car", "confidence": 0.92}
      ]
    }
  ]
}
```

---

## Implementation Phases

### Phase 1: Core YOLO Annotator (Week 1)
- [ ] Create `yolo_annotator/` module structure
- [ ] Implement `YOLOAnnotator` class with Ultralytics backend
- [ ] Implement video source handling
- [ ] Implement YOLO label export
- [ ] Create `annotator.yaml` config loader
- [ ] Add CLI script `scripts/annotate.py`
- [ ] Test with sample video

### Phase 2: Frigate-Mini Base (Week 2)
- [ ] Create `frigate_mini/` module structure
- [ ] Implement config schema and loader
- [ ] Implement base detector interface
- [ ] Implement ONNX detector (for testing)
- [ ] Implement MP4 video source
- [ ] Implement basic frame processing loop
- [ ] Test basic detection pipeline

### Phase 3: RKNN Integration (Week 3)
- [ ] Implement RKNN detector backend
- [ ] Create ONNX to RKNN conversion script
- [ ] Test on Rockchip hardware (RK3588/RK3568)
- [ ] Optimize for NPU performance
- [ ] Add fallback mechanism

### Phase 4: Snapshot & Annotation System (Week 4)
- [ ] Implement snapshot capture system
- [ ] Implement annotation writer
- [ ] Implement snapshot-label pairing
- [ ] Add trigger-based capture logic
- [ ] Create manifest generator

### Phase 5: Debug System (Week 5)
- [ ] Implement object list display
- [ ] Implement debug visualization
- [ ] Add statistics tracking
- [ ] Create debug frame saver
- [ ] Add console and file logging

### Phase 6: Integration & Testing (Week 6)
- [ ] Integration testing
- [ ] Performance optimization
- [ ] Documentation
- [ ] Example configs for common use cases
- [ ] Package for distribution

---

## Dependencies

### New Requirements

```
# requirements.txt additions

# YOLO
ultralytics>=8.0.0

# RKNN (install separately based on platform)
# rknn-toolkit2  # For conversion (x86)
# rknnlite2      # For inference (ARM)

# Video processing
opencv-python>=4.8.0
av>=10.0.0                  # PyAV for efficient video decoding

# Configuration
pyyaml>=6.0
pydantic>=2.0               # Config validation

# Utilities
tqdm>=4.65.0
numpy>=1.24.0
```

### RKNN Installation Notes

```bash
# On x86 host (for model conversion):
pip install rknn-toolkit2

# On Rockchip device (for inference):
pip install rknnlite2

# Or install from Rockchip GitHub releases
```

---

## Usage Examples

### 1. CPU-Only Workflow (ONNX) - Recommended for Development

```bash
# Step 1: Download pretrained YOLOv9t
wget https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov9t.pt -O models/yolov9t.pt

# Step 2: Convert to ONNX
python scripts/convert_to_onnx.py --input models/yolov9t.pt --output models/yolov9t.onnx

# Step 3a: Auto-annotate video (CPU)
python scripts/annotate.py --config configs/annotator_cpu.yaml
# Or with CLI args:
python scripts/annotate.py \
    --model models/yolov9t.onnx \
    --video input/video.mp4 \
    --device cpu

# Step 3b: Run Frigate-Mini (CPU)
python scripts/frigate_mini.py --config configs/frigate_mini_cpu.yaml
# Or with CLI args:
python scripts/frigate_mini.py \
    --model models/yolov9t.onnx \
    --video input/video.mp4 \
    --output output/ \
    --debug
```

### 2. RKNN Workflow (Rockchip NPU)

```bash
# Step 1: Convert ONNX to RKNN (on x86 host)
python scripts/convert_to_rknn.py \
    --input models/yolov9t.onnx \
    --output models/yolov9t.rknn \
    --platform rk3588

# Step 2: Copy to Rockchip device and run
python scripts/frigate_mini.py --config configs/frigate_mini.yaml
# Or:
python scripts/frigate_mini.py \
    --model models/yolov9t.rknn \
    --video input/video.mp4 \
    --platform rk3588
```

### 3. GPU Workflow (CUDA)

```bash
# Using Ultralytics directly with GPU
python scripts/annotate.py \
    --model models/yolov9t.pt \
    --video input/video.mp4 \
    --device cuda
```

### Quick Reference

| Task | CPU (ONNX) | RKNN (NPU) | GPU (CUDA) |
|------|------------|------------|------------|
| Model file | `.onnx` | `.rknn` | `.pt` |
| Config | `*_cpu.yaml` | `frigate_mini.yaml` | Use `--device cuda` |
| Speed | 5-15 FPS | 30+ FPS | 50+ FPS |
| Hardware | Any CPU | Rockchip SBC | NVIDIA GPU |

---

## Future Enhancements

1. **RTSP Support** - Add real camera stream input
2. **Object Tracking** - Add ByteTrack/BoT-SORT for consistent IDs
3. **Web UI** - Simple web interface for monitoring
4. **Multi-model** - Support different models per camera
5. **Event System** - Webhooks for detection events
6. **Auto-labeling Refinement** - Use SAM2 to refine YOLO boxes
7. **Active Learning** - Flag low-confidence detections for review

---

## References

- [Ultralytics YOLOv9](https://github.com/ultralytics/ultralytics)
- [RKNN-Toolkit2](https://github.com/rockchip-linux/rknn-toolkit2)
- [Frigate NVR](https://github.com/blakeblackshear/frigate)
- [YOLO Label Format](https://docs.ultralytics.com/datasets/detect/)