fair_drones

Dataset Card for KABR Behavior Telemetry

Synchronized frame-level telemetry, detections, and behavior annotations from drone wildlife monitoring in Kenya, enabling research on animal behavior analysis and optimal drone survey protocols.

Dataset Details

Dataset Description

This dataset provides frame-level integration of drone telemetry (GPS position, altitude, camera settings), animal detection bounding boxes, and expert-annotated behaviors from aerial wildlife monitoring in Kenyan savannas. Collected January 11-17, 2023 at Mpala Research Centre, the dataset contains 57 videos with complete occurrence records covering Grevy’s zebras (Equus grevyi), plains zebras (Equus quagga), and reticulated giraffes (Giraffa reticulata).

The dataset was developed to analyze optimal drone flight parameters for wildlife behavior research—correlating altitude, speed, and camera settings with data quality and animal disturbance levels. It implements Darwin Core biodiversity standards with Humboldt Eco extensions for ecological inventory data, ensuring interoperability with biodiversity databases like GBIF.

Key features:

Supported Tasks and Applications

This dataset supports computer vision, ecological analysis, and autonomous systems research:

🤖 Computer Vision Tasks:

🌿 Ecological Applications:

🚁 Drone Systems Research:

Dataset Structure

Directory Organization

kabr-behavior-telemetry/
├── data/
│   ├── occurrences/           # Frame-level occurrence records (57 videos)
│   │   ├── 11_01_23-DJI_0977.csv
│   │   ├── 11_01_23-DJI_0978.csv
│   │   └── ...
│   ├── video_events.csv       # Darwin Core Event records (68 videos)
│   └── session_events.csv     # Darwin Core Event records (18 sessions)
├── scripts/
│   ├── merge_behavior_telemetry.py    # Generate occurrence files
│   ├── update_video_events.py         # Add annotation file paths
│   ├── add_event_times.py             # Extract temporal bounds
│   └── add_gps_data.py                # Extract GPS statistics
├── metadata/
│   ├── DATA_DICTIONARY.md             # Complete field descriptions
│   └── event_session_fields.csv       # Field metadata
└── README.md

Data Instances

Occurrence Files (data/occurrences/*.csv):

Each CSV contains frame-by-frame records for one video. Example from 11_01_23-DJI_0977.csv:

Field Example Value Description
date 11_01_23 Recording date
video_id DJI_0977 Video identifier
frame 0 Frame number
date_time 2023-01-11 16:04:03,114,286 Timestamp with μs precision
latitude 0.399770 GPS latitude (WGS84)
longitude 36.891217 GPS longitude (WGS84)
altitude 20.2 Altitude (m above sea level)
iso 100 Camera ISO
xtl, ytl, xbr, ybr 1245, 678, 1389, 842 Bounding box coordinates
id 12 Mini-scene/track ID
behaviour walking Behavior class

Naming Convention:

{date}_{video_id}.csv
Example: 11_01_23-DJI_0977.csv
         └─date─┘ └video_id┘

Temporal Information:

Data Fields

See metadata/DATA_DICTIONARY.md for complete field descriptions.

Key field groups:

🌿 Darwin Core Event Fields (video_events.csv, session_events.csv):

📍 Geolocation (occurrence files):

📷 Camera Metadata (occurrence files):

🦓 Detection Annotations (occurrence files):

🏃 Behavior Labels (occurrence files):

Data Splits

This dataset does not include pre-defined train/val/test splits. Recommended splitting strategies:

Temporal Split:

Spatial Split:

Species-Stratified:

Mission-Based:

Platform and Mission Specifications

🚁 Platform Details

Type: UAV (Unmanned Aerial Vehicle)

Hardware:

Autonomy:

📷 Sensor Specifications

Primary Sensor: DJI Integrated Camera

Telemetry Included:

🗺️ Mission Parameters

Flight Specifications:

Environmental Conditions:

🔍 Sampling Protocol

Survey Design:

Flight Operations:

Data Collection:

Quality Control:

Dataset Creation

Curation Rationale

This dataset was created to address two key research questions:

  1. What drone flight parameters optimize behavioral data quality? By correlating altitude, speed, distance, and camera settings with annotation completeness and animal visibility, researchers can develop evidence-based protocols for wildlife monitoring.

  2. Can we quantify animal disturbance from drone presence? Frame-level behavior annotations allow detection of alert, fleeing, or disrupted behaviors that indicate drone impact.

The dataset fills a critical gap: while many drone wildlife datasets provide detection boxes, few include detailed behavior labels synchronized with flight telemetry. This enables research on the trade-offs between data quality and animal welfare.

Source Data

Data Collection and Processing

Field Collection:

  1. Planning:
    • Sites selected based on known zebra and giraffe populations at Mpala Research Centre
    • Flights conducted during peak activity hours (morning/afternoon)
    • Safety briefings and airspace clearance for each flight
  2. Collection:
    • Operators located focal groups via binoculars or vehicle sighting
    • Drones launched 50-100m from animals
    • Continuous video recording while following group movements
    • Flight logs automatically recorded in SRT files
    • Field notes on weather, behavior, and technical issues
  3. Post-Processing:
    • Videos transferred from SD cards with immediate backup
    • SRT files extracted for telemetry data
    • Frame extraction at 1 fps in CVAT annotation platform
    • Detection bounding boxes drawn for all visible animals
    • Mini-scenes identified (continuous behavioral sequences)
    • Behavior labels applied by trained ecologists
    • Quality review of all annotations

Software and Tools Used:

Annotations

Annotation Process

🤖 Annotation Method:

Tools Used:

Annotation Guidelines:

Quality Control:

Annotation Coverage:

Who are the annotators?

Annotator Team:

Subject Matter Experts:

Personal and Sensitive Information

⚠️ Privacy and Security Considerations:

Human Subjects:

Endangered Species:

Cultural Sensitivity:

Security:

Considerations for Using the Data

Dataset Statistics

Species Distribution:

Species (Scientific Name) Common Name Videos Sessions Individuals (range)
Equus grevyi Grevy’s zebra 5 3 3-7
Equus quagga Plains zebra 30 11 2-12
Giraffa reticulata Reticulated giraffe 6 2 4-8
Mixed Multiple species 6 1 2-4

Class Balance:

Video Characteristics:

Behavior Distribution:

Bias, Risks, and Limitations

⚠️ Known Biases:

  1. Geographic Bias:
    • Data from single site (Mpala Research Centre, Laikipia)
    • May not generalize to other savanna ecosystems
    • Represents dry season only, captured during drought conditions
  2. Temporal Bias:
    • Morning and afternoon flights only (battery/weather constraints)
    • Nocturnal or dawn/dusk behavior not captured
    • Single month snapshot (seasonal variation not represented)
  3. Species Bias:
    • Plains zebra over-represented (most abundant species)
    • Grevy’s zebra limited by population size
    • No data on smaller species (<50 cm body size)
  4. Environmental Bias:
    • Dry season habitat conditions
    • Drought-affected vegetation
    • Clear to partly cloudy weather only
    • No wet season or dense vegetation scenarios
  5. Detection Bias:
    • Animals in open areas more likely to be followed
    • Dense vegetation reduces detection probability
    • Cryptic species under-represented

Technical Limitations:

Ethical Limitations:

Recommendations

Best Practices for Using This Dataset:

  1. For Detection/Tracking Models:
    • Account for altitude-dependent scale variation (20-50m range)
    • Consider species-specific detection difficulty (giraffes easier than zebras)
    • Test generalization to new sites (single-location training data)
  2. For Behavior Recognition:
    • Class imbalance exists; consider weighted loss or resampling
    • Behavior labels are coarse; fine-grained states may be ambiguous
    • Temporal context improves accuracy (behaviors occur in sequences)
  3. For Ecological Analysis:
    • Do not extrapolate to wet season without additional data
    • Account for detection probability varying by habitat/altitude
    • Animal counts are minimum estimates (some individuals may be hidden)
  4. For Drone Protocol Development:
    • Correlate altitude/speed with detection rate and annotation completeness
    • Monitor for behavioral responses in data (alert, flee behaviors)
    • Consider trade-offs between data quality and disturbance risk

Ethical Use:

What This Dataset Should NOT Be Used For:

Licensing Information

Dataset License: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication

Citation Requirement: While CC0 does not legally require citation, we strongly request that you cite the dataset and associated paper if you use this data (see Citation section).

Code License: MIT License for scripts in this repository

Citation

If you use this dataset, please cite:

Dataset:

@misc{kline2024kabr_behavior_telemetry,
  author = {Jenna Kline and Maksim Kholiavchenko and Michelle Ramirez and Sam Stevens and Alec Sheets and Reshma Ramesh Babu and
    Namrata Banerji and Elizabeth Campolongo and Matthew Thompson Nina Van Tiel and Jackson Miliko and Isla Duporge and Neil Rosser and Eduardo Bessa and Charles Stewart and Tanya Berger-Wolf and Daniel Rubenstein},
  title = {KABR Behavior Telemetry: Frame-Level Drone Wildlife Monitoring Dataset},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/datasets/imageomics/kabr-behavior-telemetry}
}

Associated Paper:

@article{kline2024integrating,
  title = {Integrating Biological Data into Autonomous Remote Sensing Systems for
           In Situ Imageomics: A Case Study for Kenyan Animal Behavior Sensing with
           Unmanned Aerial Vehicles (UAVs)},
  author = {Kline, Jenna M. and Campolongo, Elizabeth and Thompson, Matt and others},
  journal = {arXiv preprint arXiv:2407.16864},
  year = {2024},
  url = {https://arxiv.org/abs/2407.16864}
}

FAIR² Drone Data Standard:

@article{kline2025fair2,
  title = {Toward a FAIR² Standard for Drone-Based Wildlife Monitoring Datasets},
  author = {Kline, Jenna and others},
  year = {2025},
  note = {In preparation}
}

Acknowledgements

This work was supported by the Imageomics Institute, which is funded by the US National Science Foundation’s Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

We thank:

Conservation Partners:

Data Collection Permits: The data was gathered at the Mpala Research Centre in Kenya, in accordance with Research License No. NACOSTI/P/22/18214. The data collection protocol adhered strictly to the guidelines set forth by the Institutional Animal Care and Use Committee under permission No. IACUC 1835F.

Validation and Quality Metrics

🤖 AI-Readiness Validation:

🌿 Darwin Core Validation:

⚠️ FAIR² Compliance Checklist:

Code and Tools

Data Loading (Python):

import pandas as pd

# Load session-level events
sessions = pd.read_csv('data/session_events.csv')

# Load video-level events
videos = pd.read_csv('data/video_events.csv')

# Load occurrence records for a specific video
occurrences = pd.read_csv('data/occurrences/11_01_23-DJI_0977.csv')

# Filter to frames with detections
detections = occurrences.dropna(subset=['xtl', 'ytl', 'xbr', 'ybr'])

# Group by behavior
behavior_counts = detections.groupby('behaviour').size()

Processing Scripts:

See scripts/ directory for:

Glossary

Dataset Card Authors

Jenna M. Kline

Dataset Card Contact

For questions about this dataset:


Version History:


This dataset card follows the FAIR² Drone Data Standard and extends the Imageomics dataset card template.