dataset-yolo-script/sam2-cpu/notebooks/03_train_yolov9t.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Train YOLOv9t on Custom Dataset\n",
    "\n",
    "Train YOLOv9t (tiny) model on YOLO format dataset created from SAM2 annotations.\n",
    "\n",
    "## Input\n",
    "- YOLO format dataset from `02_create_yolo_dataset.ipynb`\n",
    "\n",
    "## Output\n",
    "- Trained YOLOv9t model weights\n",
    "- Training metrics and visualizations\n",
    "\n",
    "**Platform:** Kaggle GPU (P100/T4) - Enable GPU in notebook settings!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Setup Environment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check GPU\n",
    "!nvidia-smi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install YOLOv9 (ultralytics fork with v9 support)\n",
    "!pip install -q ultralytics\n",
    "\n",
    "# Alternative: Install official YOLOv9 repo\n",
    "# !git clone https://github.com/WongKinYiu/yolov9.git\n",
    "# %cd yolov9\n",
    "# !pip install -q -r requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import torch\n",
    "import yaml\n",
    "import shutil\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from pathlib import Path\n",
    "from datetime import datetime\n",
    "from IPython.display import Image, display\n",
    "\n",
    "print(f\"Python: {sys.version}\")\n",
    "print(f\"PyTorch: {torch.__version__}\")\n",
    "print(f\"CUDA available: {torch.cuda.is_available()}\")\n",
    "if torch.cuda.is_available():\n",
    "    print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n",
    "    print(f\"CUDA version: {torch.version.cuda}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ultralytics import YOLO\n",
    "import ultralytics\n",
    "print(f\"Ultralytics version: {ultralytics.__version__}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Configuration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Dataset configuration - UPDATE THIS PATH\n",
    "# Option 1: From previous notebook (local)\n",
    "DATASET_PATH = './yolo_dataset'\n",
    "\n",
    "# Option 2: From Kaggle dataset (uncomment and update)\n",
    "# DATASET_PATH = '/kaggle/input/your-dataset-name/yolo_dataset'\n",
    "\n",
    "# Training configuration\n",
    "CONFIG = {\n",
    "    # Model\n",
    "    'model': 'yolov9t.pt',       # Pretrained YOLOv9t (tiny)\n",
    "    \n",
    "    # Training parameters\n",
    "    'epochs': 100,               # Number of epochs\n",
    "    'batch': 16,                 # Batch size (adjust based on GPU memory)\n",
    "    'imgsz': 640,                # Image size\n",
    "    'patience': 20,              # Early stopping patience\n",
    "    \n",
    "    # Optimizer\n",
    "    'optimizer': 'AdamW',        # Optimizer: SGD, Adam, AdamW\n",
    "    'lr0': 0.001,                # Initial learning rate\n",
    "    'lrf': 0.01,                 # Final learning rate factor\n",
    "    'momentum': 0.937,           # SGD momentum\n",
    "    'weight_decay': 0.0005,      # Weight decay\n",
    "    \n",
    "    # Augmentation\n",
    "    'hsv_h': 0.015,              # HSV-Hue augmentation\n",
    "    'hsv_s': 0.7,                # HSV-Saturation\n",
    "    'hsv_v': 0.4,                # HSV-Value\n",
    "    'degrees': 0.0,              # Rotation\n",
    "    'translate': 0.1,            # Translation\n",
    "    'scale': 0.5,                # Scale\n",
    "    'shear': 0.0,                # Shear\n",
    "    'flipud': 0.0,               # Flip up-down\n",
    "    'fliplr': 0.5,               # Flip left-right\n",
    "    'mosaic': 1.0,               # Mosaic augmentation\n",
    "    'mixup': 0.0,                # Mixup augmentation\n",
    "    \n",
    "    # Other\n",
    "    'workers': 4,                # DataLoader workers\n",
    "    'device': 0,                 # GPU device (0 for first GPU)\n",
    "    'project': 'runs/train',     # Output directory\n",
    "    'name': 'yolov9t_custom',    # Experiment name\n",
    "    'exist_ok': True,            # Overwrite existing\n",
    "    'pretrained': True,          # Use pretrained weights\n",
    "    'verbose': True,             # Verbose output\n",
    "}\n",
    "\n",
    "print(\"Configuration loaded!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Prepare Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check dataset exists\n",
    "dataset_path = Path(DATASET_PATH)\n",
    "\n",
    "if not dataset_path.exists():\n",
    "    print(f\"Dataset not found: {dataset_path}\")\n",
    "    print(\"Please update DATASET_PATH or upload your dataset.\")\n",
    "else:\n",
    "    print(f\"Dataset found: {dataset_path}\")\n",
    "    \n",
    "    # List contents\n",
    "    print(\"\\nContents:\")\n",
    "    for item in dataset_path.iterdir():\n",
    "        if item.is_dir():\n",
    "            count = len(list(item.rglob('*')))\n",
    "            print(f\"  {item.name}/ ({count} files)\")\n",
    "        else:\n",
    "            print(f\"  {item.name}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load and display data.yaml\n",
    "data_yaml = dataset_path / 'data.yaml'\n",
    "\n",
    "if data_yaml.exists():\n",
    "    with open(data_yaml) as f:\n",
    "        data_config = yaml.safe_load(f)\n",
    "    \n",
    "    print(\"data.yaml contents:\")\n",
    "    print(yaml.dump(data_config, default_flow_style=False))\n",
    "    \n",
    "    # Update path to absolute if needed\n",
    "    if not Path(data_config.get('path', '')).is_absolute():\n",
    "        data_config['path'] = str(dataset_path.absolute())\n",
    "        \n",
    "        # Save updated config\n",
    "        with open(data_yaml, 'w') as f:\n",
    "            yaml.dump(data_config, f, default_flow_style=False)\n",
    "        print(\"\\nUpdated path to absolute.\")\n",
    "else:\n",
    "    print(f\"data.yaml not found: {data_yaml}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Count images and labels\n",
    "train_images = len(list((dataset_path / 'images' / 'train').glob('*')))\n",
    "val_images = len(list((dataset_path / 'images' / 'val').glob('*')))\n",
    "train_labels = len(list((dataset_path / 'labels' / 'train').glob('*.txt')))\n",
    "val_labels = len(list((dataset_path / 'labels' / 'val').glob('*.txt')))\n",
    "\n",
    "print(\"Dataset Statistics:\")\n",
    "print(f\"  Train images: {train_images}\")\n",
    "print(f\"  Train labels: {train_labels}\")\n",
    "print(f\"  Val images: {val_images}\")\n",
    "print(f\"  Val labels: {val_labels}\")\n",
    "print(f\"  Classes: {data_config.get('nc', 'unknown')}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Load YOLOv9t Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load pretrained YOLOv9t model\n",
    "model = YOLO(CONFIG['model'])\n",
    "\n",
    "print(f\"Model: {CONFIG['model']}\")\n",
    "print(f\"Task: {model.task}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model information\n",
    "model.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Train Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Start training\n",
    "print(\"Starting training...\")\n",
    "print(f\"  Dataset: {data_yaml}\")\n",
    "print(f\"  Epochs: {CONFIG['epochs']}\")\n",
    "print(f\"  Batch size: {CONFIG['batch']}\")\n",
    "print(f\"  Image size: {CONFIG['imgsz']}\")\n",
    "print()\n",
    "\n",
    "results = model.train(\n",
    "    data=str(data_yaml),\n",
    "    epochs=CONFIG['epochs'],\n",
    "    batch=CONFIG['batch'],\n",
    "    imgsz=CONFIG['imgsz'],\n",
    "    patience=CONFIG['patience'],\n",
    "    optimizer=CONFIG['optimizer'],\n",
    "    lr0=CONFIG['lr0'],\n",
    "    lrf=CONFIG['lrf'],\n",
    "    momentum=CONFIG['momentum'],\n",
    "    weight_decay=CONFIG['weight_decay'],\n",
    "    hsv_h=CONFIG['hsv_h'],\n",
    "    hsv_s=CONFIG['hsv_s'],\n",
    "    hsv_v=CONFIG['hsv_v'],\n",
    "    degrees=CONFIG['degrees'],\n",
    "    translate=CONFIG['translate'],\n",
    "    scale=CONFIG['scale'],\n",
    "    shear=CONFIG['shear'],\n",
    "    flipud=CONFIG['flipud'],\n",
    "    fliplr=CONFIG['fliplr'],\n",
    "    mosaic=CONFIG['mosaic'],\n",
    "    mixup=CONFIG['mixup'],\n",
    "    workers=CONFIG['workers'],\n",
    "    device=CONFIG['device'],\n",
    "    project=CONFIG['project'],\n",
    "    name=CONFIG['name'],\n",
    "    exist_ok=CONFIG['exist_ok'],\n",
    "    pretrained=CONFIG['pretrained'],\n",
    "    verbose=CONFIG['verbose'],\n",
    ")\n",
    "\n",
    "print(\"\\nTraining complete!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Training Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Find training output directory\n",
    "train_dir = Path(CONFIG['project']) / CONFIG['name']\n",
    "\n",
    "print(f\"Training output: {train_dir}\")\n",
    "print(\"\\nContents:\")\n",
    "for item in sorted(train_dir.iterdir()):\n",
    "    print(f\"  {item.name}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display training curves\n",
    "results_png = train_dir / 'results.png'\n",
    "if results_png.exists():\n",
    "    display(Image(filename=str(results_png), width=1000))\n",
    "else:\n",
    "    print(\"results.png not found\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display confusion matrix\n",
    "confusion_matrix = train_dir / 'confusion_matrix.png'\n",
    "if confusion_matrix.exists():\n",
    "    display(Image(filename=str(confusion_matrix), width=600))\n",
    "else:\n",
    "    print(\"confusion_matrix.png not found\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display F1 curve\n",
    "f1_curve = train_dir / 'F1_curve.png'\n",
    "if f1_curve.exists():\n",
    "    display(Image(filename=str(f1_curve), width=600))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display PR curve\n",
    "pr_curve = train_dir / 'PR_curve.png'\n",
    "if pr_curve.exists():\n",
    "    display(Image(filename=str(pr_curve), width=600))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display sample predictions\n",
    "val_batch = train_dir / 'val_batch0_pred.jpg'\n",
    "if val_batch.exists():\n",
    "    print(\"Validation batch predictions:\")\n",
    "    display(Image(filename=str(val_batch), width=1000))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Evaluate Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load best weights\n",
    "best_weights = train_dir / 'weights' / 'best.pt'\n",
    "last_weights = train_dir / 'weights' / 'last.pt'\n",
    "\n",
    "print(f\"Best weights: {best_weights}\")\n",
    "print(f\"  Size: {best_weights.stat().st_size / 1024 / 1024:.1f} MB\")\n",
    "\n",
    "# Load best model\n",
    "best_model = YOLO(str(best_weights))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Evaluate on validation set\n",
    "print(\"Evaluating on validation set...\")\n",
    "metrics = best_model.val(data=str(data_yaml))\n",
    "\n",
    "print(\"\\nValidation Metrics:\")\n",
    "print(f\"  mAP50: {metrics.box.map50:.4f}\")\n",
    "print(f\"  mAP50-95: {metrics.box.map:.4f}\")\n",
    "print(f\"  Precision: {metrics.box.mp:.4f}\")\n",
    "print(f\"  Recall: {metrics.box.mr:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Test Inference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Test on validation images\n",
    "val_images_dir = dataset_path / 'images' / 'val'\n",
    "test_images = list(val_images_dir.glob('*.jpg'))[:4]\n",
    "\n",
    "if test_images:\n",
    "    print(f\"Testing on {len(test_images)} images...\")\n",
    "    \n",
    "    results = best_model.predict(\n",
    "        source=test_images,\n",
    "        conf=0.25,\n",
    "        save=True,\n",
    "        project='runs/predict',\n",
    "        name='test_inference'\n",
    "    )\n",
    "    \n",
    "    # Display results\n",
    "    predict_dir = Path('runs/predict/test_inference')\n",
    "    for img_path in sorted(predict_dir.glob('*.jpg'))[:4]:\n",
    "        print(f\"\\n{img_path.name}\")\n",
    "        display(Image(filename=str(img_path), width=600))\n",
    "else:\n",
    "    print(\"No validation images found\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inference speed test\n",
    "import time\n",
    "\n",
    "if test_images:\n",
    "    test_img = str(test_images[0])\n",
    "    \n",
    "    # Warmup\n",
    "    for _ in range(3):\n",
    "        _ = best_model.predict(test_img, verbose=False)\n",
    "    \n",
    "    # Benchmark\n",
    "    times = []\n",
    "    for _ in range(10):\n",
    "        start = time.time()\n",
    "        _ = best_model.predict(test_img, verbose=False)\n",
    "        times.append(time.time() - start)\n",
    "    \n",
    "    avg_time = np.mean(times) * 1000\n",
    "    fps = 1000 / avg_time\n",
    "    \n",
    "    print(f\"Inference speed:\")\n",
    "    print(f\"  Average: {avg_time:.1f} ms\")\n",
    "    print(f\"  FPS: {fps:.1f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Export Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export to ONNX\n",
    "print(\"Exporting to ONNX...\")\n",
    "onnx_path = best_model.export(format='onnx', simplify=True)\n",
    "print(f\"ONNX exported: {onnx_path}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export to TorchScript\n",
    "print(\"Exporting to TorchScript...\")\n",
    "torchscript_path = best_model.export(format='torchscript')\n",
    "print(f\"TorchScript exported: {torchscript_path}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Optional: Export to other formats\n",
    "# TensorRT (requires TensorRT installation)\n",
    "# engine_path = best_model.export(format='engine')\n",
    "\n",
    "# OpenVINO\n",
    "# openvino_path = best_model.export(format='openvino')\n",
    "\n",
    "# CoreML (macOS)\n",
    "# coreml_path = best_model.export(format='coreml')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Save and Download"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create output archive\n",
    "import zipfile\n",
    "\n",
    "OUTPUT_ZIP = 'yolov9t_trained.zip'\n",
    "\n",
    "print(f\"Creating {OUTPUT_ZIP}...\")\n",
    "\n",
    "with zipfile.ZipFile(OUTPUT_ZIP, 'w', zipfile.ZIP_DEFLATED) as zipf:\n",
    "    # Add weights\n",
    "    zipf.write(best_weights, 'weights/best.pt')\n",
    "    zipf.write(last_weights, 'weights/last.pt')\n",
    "    \n",
    "    # Add ONNX if exists\n",
    "    onnx_file = best_weights.with_suffix('.onnx')\n",
    "    if onnx_file.exists():\n",
    "        zipf.write(onnx_file, 'weights/best.onnx')\n",
    "    \n",
    "    # Add results\n",
    "    for result_file in train_dir.glob('*.png'):\n",
    "        zipf.write(result_file, f'results/{result_file.name}')\n",
    "    \n",
    "    for result_file in train_dir.glob('*.csv'):\n",
    "        zipf.write(result_file, f'results/{result_file.name}')\n",
    "    \n",
    "    # Add args\n",
    "    args_file = train_dir / 'args.yaml'\n",
    "    if args_file.exists():\n",
    "        zipf.write(args_file, 'args.yaml')\n",
    "\n",
    "zip_size = os.path.getsize(OUTPUT_ZIP) / 1024 / 1024\n",
    "print(f\"\\nExport complete!\")\n",
    "print(f\"  File: {OUTPUT_ZIP}\")\n",
    "print(f\"  Size: {zip_size:.1f} MB\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List all output files\n",
    "print(\"\\nAll output files:\")\n",
    "print(f\"\\nTraining directory: {train_dir}\")\n",
    "!ls -la {train_dir}/weights/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Final summary\n",
    "print(\"=\" * 60)\n",
    "print(\"TRAINING SUMMARY\")\n",
    "print(\"=\" * 60)\n",
    "print(f\"\\nModel: YOLOv9t\")\n",
    "print(f\"Dataset: {dataset_path}\")\n",
    "print(f\"  Train images: {train_images}\")\n",
    "print(f\"  Val images: {val_images}\")\n",
    "print(f\"  Classes: {data_config.get('nc', 'unknown')}\")\n",
    "\n",
    "print(f\"\\nTraining:\")\n",
    "print(f\"  Epochs: {CONFIG['epochs']}\")\n",
    "print(f\"  Batch size: {CONFIG['batch']}\")\n",
    "print(f\"  Image size: {CONFIG['imgsz']}\")\n",
    "\n",
    "print(f\"\\nResults:\")\n",
    "print(f\"  mAP50: {metrics.box.map50:.4f}\")\n",
    "print(f\"  mAP50-95: {metrics.box.map:.4f}\")\n",
    "print(f\"  Precision: {metrics.box.mp:.4f}\")\n",
    "print(f\"  Recall: {metrics.box.mr:.4f}\")\n",
    "\n",
    "print(f\"\\nOutput files:\")\n",
    "print(f\"  Best weights: {best_weights}\")\n",
    "print(f\"  Export archive: {OUTPUT_ZIP}\")\n",
    "\n",
    "print(\"\\n\" + \"=\" * 60)\n",
    "print(\"Training complete! Download weights for deployment.\")\n",
    "print(\"=\" * 60)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Usage Example\n",
    "\n",
    "After training, use the model for inference:\n",
    "\n",
    "```python\n",
    "from ultralytics import YOLO\n",
    "\n",
    "# Load trained model\n",
    "model = YOLO('best.pt')\n",
    "\n",
    "# Inference on image\n",
    "results = model.predict('image.jpg', conf=0.25)\n",
    "\n",
    "# Inference on video\n",
    "results = model.predict('video.mp4', conf=0.25, save=True)\n",
    "\n",
    "# Access detections\n",
    "for result in results:\n",
    "    boxes = result.boxes\n",
    "    for box in boxes:\n",
    "        x1, y1, x2, y2 = box.xyxy[0]\n",
    "        confidence = box.conf[0]\n",
    "        class_id = box.cls[0]\n",
    "```\n",
    "\n",
    "## Tips\n",
    "\n",
    "- **Low mAP?** Try:\n",
    "  - More training epochs\n",
    "  - Data augmentation adjustments\n",
    "  - Lower learning rate\n",
    "  - More training data\n",
    "\n",
    "- **Overfitting?** Try:\n",
    "  - More augmentation\n",
    "  - Dropout/regularization\n",
    "  - Early stopping (patience)\n",
    "\n",
    "- **Slow training?** Try:\n",
    "  - Larger batch size (if GPU memory allows)\n",
    "  - Mixed precision (amp=True)\n",
    "  - Smaller image size"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}