Bridging computer vision and biology, TreeOfLife datasets provide a rich resource for training and evaluating models, based in biological knowledge.
I need the largest available TreeOfLife dataset with images paired to their taxonomic labels.
Go to TreeOfLife-200M →I want to evaluate my model's performance on the IDLE-OO Camera Trap Benchmark presented in the 2025 NeurIPS paper.
Go to IDLE-OO Camera Trap Benchmark →I want to train a model with morphological image captions for each image.
Go to TreeOfLife-10M Captions →I need the original TreeOfLife-10M dataset described in the 2024 CVPR paper.
Go to TreeOfLife-10M →I want to evaluate my model's performance on the Rare Species Benchmark presented in the 2024 CVPR paper.
Go to Rare Species Benchmark→I want to explore or download datasets or benchmarks used to train and evaluate the BioCLIP models.
Go to HF Collection →The TreeOfLife datasets are curated collections of images representing a wide range of biological taxa, paired with their corresponding taxonomic labels. These datasets are designed to facilitate the training and evaluation of vision-based knowledge-guided biological foundation models.
With nearly 214-million images representing more than 950-thousand taxa across the tree of life, TreeOfLife-200M is the largest and most diverse public ML-ready dataset for computer vision models in biology at release. This dataset combines images and metadata from four core biodiversity data providers: Global Biodiversity Information Facility (GBIF), Encyclopedia of Life (EOL), BIOSCAN-5M, and FathomNet to more than double the number of unique taxa covered by TreeOfLife-10M, adding 50 million more images than BioTrove (and nearly triple the unique taxa).
TreeOfLife-200M also increases image context diversity with museum specimen, camera trap, and citizen science images well-represented. Our rigorous curation process ensures each image has the most specific taxonomic label possible and that the overall dataset provides a well-rounded foundation for training BioCLIP 2 and future biology foundation models.
A dataset of 10 million generated captions, Wikipedia-derived descriptions and format examples for the TreeOfLife-10M. These captions were generated using InternVL3-38B based on biological contexts that help the model generate more accurate captions. It was used to train BioCAP, a CLIP-based model. This dataset is designed to enhance the training of vision models by providing rich, contextual information about each image included in the original TreeOfLife-10M dataset.
The original TreeOfLife dataset presented in "BioCLIP: A Vision Foundation Model for the Tree of Life". With over 10-million images covering 454-thousand taxa in the tree of life, TreeOfLife-10M was the largest-to-date ML-ready dataset of images of biological organisms paired with their associated taxonomic labels.
It expanded on the foundation established by existing high-quality datasets, such as iNat21 and BIOSCAN-1M, by further incorporating newly curated images from the Encyclopedia of Life (eol.org), which supplies most of TreeOfLife-10M’s data diversity. Every image in TreeOfLife-10M is labeled to the most specific taxonomic level possible, as well as higher taxonomic ranks in the tree of life. TreeOfLife-10M was generated for the purpose of training BioCLIP and future biology foundation models.
The TreeOfLife benchmark datasets are curated collections of images representing a wide range of biological taxa, paired with their corresponding taxonomic labels. These benchmarks are designed to provide biologically and ecologically relevant evaluation tasks.
IDLE-OO Camera Traps is a 5-dataset benchmark of camera trap images from the Labeled Information Library of Alexandria: Biology and Conservation (LILA BC) with a total of 2,586 images for species classification. Each of the 5 benchmarks is balanced to have the same number of images for each species within it (between 310 and 1120 images), representing between 16 and 39 species.
The Rare Species Benchmark was generated alongside TreeOfLife-10M as a benchmark for BioCLIP; data (images and text) were pulled from Encyclopedia of Life (EOL) to generate a dataset consisting of rare species for zero-shot-classification and more refined image classification tasks. Here, we use "rare species" to mean species listed on The International Union for Conservation of Nature (IUCN) Red List as Near Threatened, Vulnerable, Endangered, Critically Endangered, and Extinct in the Wild.
The central warehouse for all BioCLIP assets. This collection aggregates all versions of the models, the training datasets, benchmarks, and interactive demos.
Use this if you need direct access to the TreeOfLife datasets or evaluation benchmarks.