fair_drones

|Image alt-text| |:–| |Figure 1. Representative image from the dataset showing [describe what’s shown].|

Dataset Card for [Dataset pretty_name]

Dataset Details

Dataset Description

Add 2-3 paragraphs describing:

Supported Tasks and Applications

This dataset supports the following computer vision and ecological analysis tasks:

List those that apply, non-exhaustive suggestions by topic below: 🤖 Computer Vision Tasks:

🌿 Ecological Applications:

🤖 Robotics Applications:

Benchmark Results:

[If available, provide baseline performance metrics]

Method Detection mAP@50 Tracking MOTA Behavior Acc Reference
YOLOv8 0.XX - - [link]

Dataset Structure

Directory Organization

The suggested dataset structure for full drone data is provided below; modify as needed based on the tasks or applications supported by your data.

dataset/
├── images/
│   ├── train/
│   │   ├── rgb/
│   │   │   └── {mission_id}_{frame_id}.jpg
│   │   └── thermal/  # if applicable
│   ├── val/
│   └── test/
├── annotations/
│   ├── train/
│   │   ├── coco_format.json
│   │   ├── yolo_format/
│   │   └── tracking/  # if applicable
│   ├── val/
│   └── test/
├── telemetry/  # if available
│   └── {mission_id}_telemetry.csv
├── metadata/
│   ├── darwin_core_events.csv  # 🌿 Darwin Core Event records
│   ├── darwin_core_occurrences.csv  # 🌿 Darwin Core Occurrence records
│   ├── missions.csv  # Mission-level metadata
│   ├── sensors.json  # Sensor specifications
│   └── species_info.json  # Taxonomic information
└── README.md

Data Instances

Images:

Naming Convention:

{mission_id}_{frame_number}[_{sensor_type}].{extension}
Example: AWS2024-045_003821.jpg
         └─mission─┘ └frame─┘

Mission ID: Unique identifier for each flight (format: [prefix]-[number])
Frame Number: Sequential frame number within mission (zero-padded to 6 digits)
Sensor Type: Optional suffix for multi-sensor data (_rgb, _thermal, etc.)

Temporal Information:

Data Fields

Provide more details about the included metadata.

🌿 Darwin Core Event Metadata (metadata/darwin_core_events.csv)

⚠️ Required fields for minimal Darwin Core compliance are provided below, add any other available information you can:

Field Type Description Example
eventID string Unique identifier for sampling event “AWS2024-045”
eventDate date Date of survey (ISO 8601) “2024-03-15”
eventTime time Time of survey start “06:30:00+03:00”
decimalLatitude float Latitude in decimal degrees (WGS84) -2.3456
decimalLongitude float Longitude in decimal degrees (WGS84) 34.8123
coordinateUncertaintyInMeters integer GPS precision 5
geodeticDatum string Coordinate system “WGS84”
locality string Named location “Serengeti National Park, Sector A3”
habitat string Habitat type “Open savanna with scattered Acacia”
samplingProtocol string Survey method “UAV transect at 60m AGL, 5m/s”
sampleSizeValue float Area/duration surveyed 250
sampleSizeUnit string Unit for sample size “hectares”
samplingEffort string Effort description “45 min flight, 70% overlap”

Optional but recommended:

🌿 Darwin Core Occurrence Metadata (metadata/darwin_core_occurrences.csv)

⚠️ Required fields for minimal Darwin Core compliance, add any other available information you can:

Field Type Description Example
occurrenceID string Unique observation identifier “AWS2024-045_003821_001”
eventID string Links to Event record “AWS2024-045”
scientificName string Full scientific name “Loxodonta africana (Blumenbach, 1797)”
kingdom string Taxonomic kingdom “Animalia”
phylum string Taxonomic phylum “Chordata”
class string Taxonomic class “Mammalia”
order string Taxonomic order “Proboscidea”
family string Taxonomic family “Elephantidae”
genus string Taxonomic genus “Loxodonta”
species string Specific epithet “africana”
taxonRank string Rank of identification “species”
individualCount integer Number of individuals 12

Optional but recommended:

🤖 Computer Vision Annotations

Modify the example CV annotation format provided below to match the details of your data.

COCO Format (annotations/train/coco_format.json):

{
  "info": {
    "description": "[Dataset name and description]",
    "version": "1.0",
    "year": 2024,
    "date_created": "2024-01-15"
  },
  "licenses": [...],
  "images": [
    {
      "id": 1,
      "file_name": "AWS2024-045_003821.jpg",
      "width": 5280,
      "height": 2970,
      "date_captured": "2024-03-15T06:35:42+03:00",
      "mission_id": "AWS2024-045",
      "altitude_m": 60,
      "gsd_cm_per_px": 1.5
    }
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [x, y, width, height],
      "area": 12543.5,
      "iscrowd": 0,
      "attributes": {
        "occlusion": "none",
        "truncation": false,
        "life_stage": "adult",
        "behavior": "foraging",
        "group_size": 12,
        "confidence": 0.95
      },
      "occurrence_id": "AWS2024-045_003821_001"  # Links to Darwin Core
    }
  ],
  "categories": [
    {
      "id": 1,
      "name": "african_elephant",
      "supercategory": "mammal",
      "scientific_name": "Loxodonta africana"
    }
  ]
}

Tracking Format (if applicable):

MOT Challenge format with species labels:

{frame_number},{object_id},{bbox_left},{bbox_top},{bbox_width},{bbox_height},{confidence},{species_id},{life_stage},{behavior}

Data Splits

Split Images Annotations Species Coverage Temporal Coverage
Train X,XXX XX,XXX All species All seasons
Validation X,XXX XX,XXX All species Stratified
Test X,XXX XX,XXX All species Held-out missions

Split Methodology:

Describe how splits were created, e.g.:

Platform and Mission Specifications

🚁 Platform Details

Type: [UAV / AUV / ROV / USV / UGV]

Hardware:

Autonomy:

Payload:

📷 Sensor Specifications

Primary Sensor: [Name]

Spectral Bands (if applicable): | Band | Wavelength (nm) | Purpose | |——|—————–|———| | Red | 590-700 | Vegetation health | | … | … | … |

Calibration:

Synchronization (for multi-sensor):

🗺️ Mission Parameters

Flight Specifications:

Telemetry Data:

Environmental Conditions:

🔍 Sampling Protocol

⚠️ Full sampling protocol description:

[Fill in this detailed description following Barnas et al. (2020) reporting standards:]

  1. Survey Design:
    • [Systematic grid / transects / random sampling / adaptive]
    • [Spacing, coverage strategy]
  2. Flight Operations:
    • [Pre-flight checks and procedures]
    • [Operator training and certification]
    • [Safety protocols]
  3. Data Collection:
    • [Trigger method: time-based, distance-based, manual]
    • [Collection parameters]
  4. Quality Control:
    • [In-field QC procedures]
    • [Data backup and verification]

📋 Permits and Compliance

Permits Obtained:

Regulations Followed:

Ethics Approval:

Animal Welfare Protocol:

Dataset Creation

Curation Rationale

Describe the motivation for creating this dataset:

Source Data

Data Collection and Processing

Field Collection (describe what YOU did):

  1. Planning:
    • [Site selection criteria]
    • [Temporal sampling design]
  2. Collection:
    • [Flight execution]
    • [Data storage and backup in field]
  3. Post-Processing:
    • [Image processing pipeline]
    • [Quality filtering criteria]
    • [Georeferencing method]
    • [Radiometric correction (if applicable)]

Software and Tools Used:

Who are the source data producers?

Field Team:

Local Collaboration:

Annotations

Annotation Process

🤖 Annotation Method:

Tools Used:

Annotation Guidelines:

Quality Control:

Annotation Coverage:

Who are the annotators?

Annotator Team:

Subject Matter Experts:

Personal and Sensitive Information

⚠️ Privacy and Security Considerations:

Human Subjects:

Endangered Species:

Cultural Sensitivity:

Security:

Considerations for Using the Data

Dataset Statistics

Species Distribution:

Species (Scientific Name) Common Name Train Val Test Total
Species name Common name XXX XX XX XXX

Class Balance:

Image Characteristics:

Detection Difficulty Metadata:

Difficulty Factor Easy (%) Medium (%) Hard (%)
Occlusion XX XX XX
Crowd density XX XX XX
Scale (small objects) XX XX XX
Weather/lighting XX XX XX

Bias, Risks, and Limitations

⚠️ Known Biases:

  1. Geographic Bias:
    • [e.g., “Data collected only from protected areas, may not represent human-modified landscapes”]
  2. Temporal Bias:
    • [e.g., “Morning flights only, nocturnal behavior not captured”]
  3. Species Bias:
    • [e.g., “Large-bodied species over-represented due to detection ease”]
  4. Environmental Bias:
    • [e.g., “Dry season only, no data on wet season habitat use”]
  5. Detection Bias:
    • [e.g., “Detection probability varies by habitat cover”]
    • [e.g., “Small animals (<50cm) under-detected at this altitude”]

Technical Limitations:

Ethical Limitations:

Recommendations

Best Practices for Using This Dataset:

  1. For Detection/Tracking Models:
    • [e.g., “Account for altitude-dependent scale variation”]
    • [e.g., “Consider species-specific detection thresholds”]
  2. For Ecological Analysis:
    • [e.g., “Apply detection probability corrections for sparse vegetation”]
    • [e.g., “Do not generalize beyond dry season without additional data”]
  3. For Transfer Learning:
    • [e.g., “Model trained at 60m may require fine-tuning for other altitudes”]
    • [e.g., “RGB-thermal fusion recommended for dense vegetation”]
  4. Ethical Use:
    • [e.g., “Do not share precise coordinates publicly”]
    • [e.g., “Consider conservation implications of automated detection tools”]

What This Dataset Should NOT Be Used For:

Multimodal Linkages

Associated Datasets (if applicable):

Synchronization:

Licensing Information

Dataset License: Full license name

Citation Requirement: If you use this dataset, you MUST cite both the dataset and associated paper (see Citation section).

Image Licensing: [If different from compilation]

Code License: [If releasing code alongside data]

Citation

If you use this dataset, please cite:

Dataset:

@misc{yourdataset2024,
  author = {Last, First and Last, First},
  title = {Dataset Title},
  year = {2024},
  publisher = {Hugging Face},
  url = {https://huggingface.co/datasets/your-org/your-dataset},
  doi = {10.XXXX/XXXXX}
}

Paper:

@article{yourpaper2024,
  title = {Paper Title},
  author = {Last, First and Last, First},
  journal = {Journal Name},
  year = {2024},
  volume = {X},
  pages = {XX-XX},
  doi = {10.XXXX/XXXXX}
}

FAIR² Drones Drone Data Standard:

@article{kline2025wildfair,
  title = {FAIR² Drones: An AI-Ready Standard for Cross-Domain Wildlife Drone Datasets},
  author = {Kline, Jenna and others},
  year = {2025},
  doi = {10.XXXX/XXXXX}
}

Acknowledgements

This work was supported by [funding source].

We thank:

This work was supported by the Imageomics Institute, which is funded by the US National Science Foundation’s Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Conservation Partners:

Data Collection Permits: [List major permits with gratitude to issuing authorities]

Validation and Quality Metrics

🤖 AI-Readiness Validation:

🌿 Darwin Core Validation:

⚠️ FAIR² Compliance Checklist:

Code and Tools

Data Loading:

# Example: Load dataset using Hugging Face datasets library
from datasets import load_dataset

dataset = load_dataset("your-org/your-dataset")

# Access images and annotations
train_data = dataset['train']
for sample in train_data:
    image = sample['image']
    annotations = sample['annotations']
    # Your code here

Visualization Tools:

Evaluation Scripts:

Glossary

AGL: Above Ground Level - altitude measured from terrain surface below Darwin Core: Biodiversity data standard maintained by TDWG FAIR²: Extension of FAIR principles for AI-ready datasets GSD: Ground Sampling Distance - real-world size represented by one pixel UAV: Unmanned Aerial Vehicle (drone) TDWG: Biodiversity Information Standards organization

Dataset Card Authors

[List names of card authors]

Dataset Card Contact

For questions about this dataset:


Version History:


This dataset card follows the FAIR² Drone Data Standard (Kline et al., 2025) and extends the Imageomics dataset card template.