This example demonstrates how to create a FAIR² Drones-compliant dataset from drone wildlife monitoring data. It showcases the complete workflow for transforming raw drone footage, GPS telemetry, and behavior annotations into a structured, machine-readable dataset that follows FAIR principles and biodiversity data standards.
This is a reference implementation showing how the KABR Behavior Telemetry dataset was created from drone monitoring of wildlife in Kenya.
examples/kabr/
├── README.md # This file
├── dataset_card.md # FAIR² Dronesx-compliant dataset documentation
├── metadata/
│ ├── DATA_DICTIONARY.md # Field-level documentation
│ └── event_session_fields.csv # Darwin Core Event mappings
└── scripts/
├── add_gps_data.py # GPS telemetry integration
├── add_event_times.py # Timestamp processing
├── merge_behavior_telemetry.py # Main data pipeline script
└── update_video_events.py # Annotation validation
A complete dataset card demonstrating how to:
The following Python scripts demonstrate the complete data preparation pipeline, showing how to transform raw drone data into FAIR² Drones-compliant datasets. Script requirements are provided in Prerequisites, below.
What it does:
Input files:
*.SRT - Drone telemetry files (GPS + camera settings per frame)*_tracks.xml - Object detection/tracking annotationsactions/*.xml - Behavior annotations for tracked animalsOutput:
data/occurrences/{date}-{video_id}.csv - Frame-level occurrence recordsExample usage:
python scripts/merge_behavior_telemetry.py \
--session_data /path/to/raw/drone/data \
--annotations /path/to/behavior/annotations \
--output_dir ./data/occurrences
What it does:
video_events.csv:
decimalLatitude / decimalLongitude (launch point coordinates)minimumElevationInMeters / maximumElevationInMeters (altitude range)footprintWKT (bounding box in Well-Known Text format for GIS compatibility)Why this matters: Event-level GPS summaries enable spatial queries and geographic filtering without loading frame-level data
Example usage:
python scripts/add_gps_data.py \
--video_events ./data/video_events.csv \
--occurrences ./data/occurrences \
--output ./data/video_events_with_gps.csv
What it does:
video_events.csv with Darwin Core temporal fields:
eventTime (start time of video in HH:MM:SS format)endTime (end time of video)Why this matters: Enables temporal filtering and analysis of daily activity patterns, time-of-day behaviors, etc.
Example usage:
python scripts/add_event_times.py \
--video_events ./data/video_events.csv \
--occurrences ./data/occurrences
What it does:
associatedMedia field in video_events.csv with JSON containing:
Why this matters: Maintains data provenance and enables users to trace processed data back to original source files
Example usage:
python scripts/update_video_events.py \
--video_events ./data/video_events.csv \
--data_path /path/to/raw/data
This example demonstrates how raw drone data flows through a processing pipeline to create AI-ready datasets:
Raw Drone Data Processing Scripts Output Dataset
─────────────── ────────────────── ──────────────
📹 Video Files (*.MP4)
📍 GPS Telemetry (*.SRT) ──┐
📷 Camera Metadata (*.SRT) ├──► merge_behavior_ ──► 📊 Frame-level
🎯 Detection Tracks (*.xml) ──┤ telemetry.py Occurrences
🐾 Behavior Labels (*.xml) ──┘ (CSV files)
│
│
┌─────────────────────────────────────────┘
│
├──► add_gps_data.py ──► 🗺️ Event GPS
│ Summaries
│
├──► add_event_times.py ──► ⏰ Event Time
│ Windows
│
└──► update_video_events.py ──► 🔗 Source File
Provenance
Final Output: FAIR² Dronesx-compliant dataset ready for Hugging Face
For using the dataset: No prerequisites - just install the Hugging Face datasets library:
pip install datasets
For running the processing scripts:
pip install pandas numpy pysrt tqdm
Your raw drone data should include:
If you just want to use the dataset for machine learning or analysis:
from datasets import load_dataset
# Load the complete dataset
dataset = load_dataset("imageomics/kabr-behavior-telemetry")
# Access frame-level occurrence data
occurrences = dataset['train'] # Contains all frame-level records
# Each record contains:
# - GPS coordinates (latitude, longitude, altitude) for each frame
# - Camera settings (ISO, shutter, focal length, etc.)
# - Animal detections (bounding boxes, species)
# - Behavior annotations (grazing, walking, running, etc.)
# - Temporal information (timestamps, video frame numbers)
If you want to create your own FAIR² Drones-compliant drone dataset:
Typical workflow:
# Step 1: Merge all data sources into frame-level occurrences
python scripts/merge_behavior_telemetry.py --session_data ./raw_data --output_dir ./occurrences
# Step 2: Add GPS summaries to video events
python scripts/add_gps_data.py --video_events ./video_events.csv --occurrences ./occurrences
# Step 3: Add temporal metadata
python scripts/add_event_times.py --video_events ./video_events.csv --occurrences ./occurrences
# Step 4: Link to source files
python scripts/update_video_events.py --video_events ./video_events.csv --data_path ./raw_data
This example illustrates:
Q: Do I need the raw video files to use the dataset?
A: No. The dataset contains frame-level occurrence records with all extracted metadata. Videos are not included due to size constraints, but GPS coordinates and timestamps allow you to recreate spatial-temporal context.
Q: Can I use these scripts with non-DJI drones?
A: Yes, but you’ll need to modify the telemetry parsing. The merge_behavior_telemetry.py script reads DJI’s SRT format. For other drones, adapt the pandify_srt_data() function to parse your drone’s telemetry format.
Q: What if I only have object detections but no behavior annotations?
A: You can still use the pipeline! The scripts will create occurrence records with detection data only. Behavior fields will be empty but the spatial-temporal framework remains valid.
Q: How do I know if my dataset is FAIR² Drones compliant?
A: Use the dataset_card.md as a checklist. Key requirements:
Q: Can I contribute improvements to these scripts?
A: Yes! This is a reference implementation. Contributions that improve generalizability, add support for other drone platforms, or enhance Darwin Core compliance are welcome.
“Could not parse eventID” errors:
video_events.csv uses the format: KABR-2023:DATE_SESSION:VIDEO_IDKABR-2023:11_01_23_session_1:DJI_0977“No occurrence file found” warnings:
{date}-{video_id}.csv11_01_23-DJI_0977.csvEmpty GPS or timestamp fields:
DJI_0977.SRT for DJI_0977.MP4)Script fails with “No module named ‘pysrt’”:
pip install pysrt pandas numpy tqdmThis example shows that creating FAIR, AI-ready wildlife datasets requires more than just organizing files—it requires thoughtful integration of heterogeneous data sources, adherence to community standards, and comprehensive documentation that serves both human researchers and machine learning systems.