fair_drones

KABR Behavior Telemetry Example

This example demonstrates how to create a FAIR² Drones-compliant dataset from drone wildlife monitoring data. It showcases the complete workflow for transforming raw drone footage, GPS telemetry, and behavior annotations into a structured, machine-readable dataset that follows FAIR principles and biodiversity data standards.

What This Example Contains

This is a reference implementation showing how the KABR Behavior Telemetry dataset was created from drone monitoring of wildlife in Kenya.

Directory Structure - KABR Example

examples/kabr/
├── README.md                          # This file
├── dataset_card.md                    # FAIR² Dronesx-compliant dataset documentation
├── metadata/
│   ├── DATA_DICTIONARY.md            # Field-level documentation
│   └── event_session_fields.csv      # Darwin Core Event mappings
└── scripts/
    ├── add_gps_data.py               # GPS telemetry integration
    ├── add_event_times.py            # Timestamp processing
    ├── merge_behavior_telemetry.py   # Main data pipeline script
    └── update_video_events.py        # Annotation validation

1. FAIR² Drones-Compliant Dataset Card (dataset_card.md)

A complete dataset card demonstrating how to:

2. Data Processing Scripts (scripts/)

The following Python scripts demonstrate the complete data preparation pipeline, showing how to transform raw drone data into FAIR² Drones-compliant datasets. Script requirements are provided in Prerequisites, below.

merge_behavior_telemetry.py - Main Pipeline Script

What it does:

Input files:

Output:

Example usage:

python scripts/merge_behavior_telemetry.py \
  --session_data /path/to/raw/drone/data \
  --annotations /path/to/behavior/annotations \
  --output_dir ./data/occurrences

add_gps_data.py - Event-Level GPS Enrichment

What it does:

Why this matters: Event-level GPS summaries enable spatial queries and geographic filtering without loading frame-level data

Example usage:

python scripts/add_gps_data.py \
  --video_events ./data/video_events.csv \
  --occurrences ./data/occurrences \
  --output ./data/video_events_with_gps.csv

add_event_times.py - Temporal Metadata Extraction

What it does:

Why this matters: Enables temporal filtering and analysis of daily activity patterns, time-of-day behaviors, etc.

Example usage:

python scripts/add_event_times.py \
  --video_events ./data/video_events.csv \
  --occurrences ./data/occurrences

update_video_events.py - Source File Linkage

What it does:

Why this matters: Maintains data provenance and enables users to trace processed data back to original source files

Example usage:

python scripts/update_video_events.py \
  --video_events ./data/video_events.csv \
  --data_path /path/to/raw/data

3. Metadata Documentation (metadata/)

Data Pipeline Overview

This example demonstrates how raw drone data flows through a processing pipeline to create AI-ready datasets:

Raw Drone Data                    Processing Scripts              Output Dataset
───────────────                    ──────────────────              ──────────────

📹 Video Files (*.MP4)
📍 GPS Telemetry (*.SRT)      ──┐
📷 Camera Metadata (*.SRT)      ├──► merge_behavior_         ──► 📊 Frame-level
🎯 Detection Tracks (*.xml)   ──┤    telemetry.py                  Occurrences
🐾 Behavior Labels (*.xml)    ──┘                                  (CSV files)
                                                                         │
                                                                         │
                               ┌─────────────────────────────────────────┘
                               │
                               ├──► add_gps_data.py         ──► 🗺️  Event GPS
                               │                                  Summaries
                               │
                               ├──► add_event_times.py      ──► ⏰ Event Time
                               │                                  Windows
                               │
                               └──► update_video_events.py  ──► 🔗 Source File
                                                                  Provenance

Final Output: FAIR² Dronesx-compliant dataset ready for Hugging Face

Prerequisites

For using the dataset: No prerequisites - just install the Hugging Face datasets library:

pip install datasets

For running the processing scripts:

pip install pandas numpy pysrt tqdm

Your raw drone data should include:

Getting Started

For Dataset Users

If you just want to use the dataset for machine learning or analysis:

from datasets import load_dataset

# Load the complete dataset
dataset = load_dataset("imageomics/kabr-behavior-telemetry")

# Access frame-level occurrence data
occurrences = dataset['train']  # Contains all frame-level records

# Each record contains:
# - GPS coordinates (latitude, longitude, altitude) for each frame
# - Camera settings (ISO, shutter, focal length, etc.)
# - Animal detections (bounding boxes, species)
# - Behavior annotations (grazing, walking, running, etc.)
# - Temporal information (timestamps, video frame numbers)

For Dataset Creators

If you want to create your own FAIR² Drones-compliant drone dataset:

  1. Study the dataset card (dataset_card.md) to understand the metadata structure
  2. Examine the data dictionary (metadata/DATA_DICTIONARY.md) to see field definitions
  3. Review the processing scripts (scripts/) to understand the data pipeline
  4. Adapt the scripts for your own drone data sources
  5. Document your dataset: download and fill out the FAIR² Drones Dataset Card Template for your own data

Typical workflow:

# Step 1: Merge all data sources into frame-level occurrences
python scripts/merge_behavior_telemetry.py --session_data ./raw_data --output_dir ./occurrences

# Step 2: Add GPS summaries to video events
python scripts/add_gps_data.py --video_events ./video_events.csv --occurrences ./occurrences

# Step 3: Add temporal metadata
python scripts/add_event_times.py --video_events ./video_events.csv --occurrences ./occurrences

# Step 4: Link to source files
python scripts/update_video_events.py --video_events ./video_events.csv --data_path ./raw_data

What You’ll Learn

This example illustrates:

Common Questions

Q: Do I need the raw video files to use the dataset?

A: No. The dataset contains frame-level occurrence records with all extracted metadata. Videos are not included due to size constraints, but GPS coordinates and timestamps allow you to recreate spatial-temporal context.

Q: Can I use these scripts with non-DJI drones?

A: Yes, but you’ll need to modify the telemetry parsing. The merge_behavior_telemetry.py script reads DJI’s SRT format. For other drones, adapt the pandify_srt_data() function to parse your drone’s telemetry format.

Q: What if I only have object detections but no behavior annotations?

A: You can still use the pipeline! The scripts will create occurrence records with detection data only. Behavior fields will be empty but the spatial-temporal framework remains valid.

Q: How do I know if my dataset is FAIR² Drones compliant?

A: Use the dataset_card.md as a checklist. Key requirements:

Q: Can I contribute improvements to these scripts?

A: Yes! This is a reference implementation. Contributions that improve generalizability, add support for other drone platforms, or enhance Darwin Core compliance are welcome.

Troubleshooting

“Could not parse eventID” errors:

“No occurrence file found” warnings:

Empty GPS or timestamp fields:

Script fails with “No module named ‘pysrt’”:

Key Takeaway

This example shows that creating FAIR, AI-ready wildlife datasets requires more than just organizing files—it requires thoughtful integration of heterogeneous data sources, adherence to community standards, and comprehensive documentation that serves both human researchers and machine learning systems.

Learn More