Skip to content

Python Tutorial

Before beginning this tutorial you need to install pybioclip and download two example images: Ursus-arctos.jpeg and Felis-catus.jpeg.

Predict species classification

from bioclip import TreeOfLifeClassifier, Rank

classifier = TreeOfLifeClassifier()
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)

for prediction in predictions:
    print(prediction["species"], "-", prediction["score"])

Output:

Ursus arctos - 0.9356034994125366
Ursus arctos syriacus - 0.05616999790072441
Ursus arctos bruinosus - 0.004126196261495352
Ursus arctus - 0.0024959812872111797
Ursus americanus - 0.0005009894957765937

Output from the predict() method showing the dictionary structure:

[{
    'kingdom': 'Animalia',
    'phylum': 'Chordata',
    'class': 'Mammalia',
    'order': 'Carnivora',
    'family': 'Ursidae',
    'genus': 'Ursus',
    'species_epithet': 'arctos',
    'species': 'Ursus arctos',
    'common_name': 'Kodiak bear'
    'score': 0.9356034994125366
}]

The output from the predict function can be converted into a pandas DataFrame like so:

import pandas as pd
from bioclip import TreeOfLifeClassifier, Rank

classifier = TreeOfLifeClassifier()
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
df = pd.DataFrame(predictions)

The first argument of the predict() method supports both a single path or a list of paths.

Documentation

The TreeOfLifeClassifier docs contains details about the arguments supported by the constructor and the predict() method.

Predict from a list of classes

from bioclip import CustomLabelsClassifier

classifier = CustomLabelsClassifier(["duck","fish","bear"])
predictions = classifier.predict("Ursus-arctos.jpeg")
for prediction in predictions:
   print(prediction["classification"], prediction["score"])
Output:
duck 1.0306726583309e-09
fish 2.932403668845507e-12
bear 1.0

Documentation

The CustomLabelsClassifier docs contains details about the arguments supported by the constructor and the predict() method.

Predict using a Custom Model

To predict with a custom model the model_str and pretrained_str arguments must be specified. In this example the CLIP-ViT-B-16-laion2B-s34B-b88K model is used.

from bioclip import CustomLabelsClassifier

classifier = CustomLabelsClassifier(
    cls_ary = ["duck","fish","bear"],
    model_str='ViT-B-16',
    pretrained_str='laion2b_s34b_b88k')

print(classifier.predict("Ursus-arctos.jpeg"))

See this tutorial for instructions for listing available pretrained models.

Predict from a list of classes with binning

from bioclip import CustomLabelsBinningClassifier

classifier = CustomLabelsBinningClassifier(cls_to_bin={
  'dog': 'small',
  'fish': 'small',
  'bear': 'big',
})
predictions = classifier.predict("Ursus-arctos.jpeg")

for prediction in predictions:
   print(prediction["classification"], prediction["score"])
Output:
big 0.99992835521698
small 7.165559509303421e-05

Documentation

The CustomLabelsBinningClassifier documentation describes all arguments supported by the constructor. The base class CustomLabelsClassifier docs describes arguments for the predict method.

Example Notebooks

Predict species for images

PredictImages.ipynb downloads some images and predicts species. Open In Colab

Predict species for iNaturalist images

iNaturalistPredict.ipynb downloads images from inaturalist.org and predicts species. Open In Colab

Predict using a subset of the TreeOfLife

TOL-Subsetting.ipynb filters the TreeOfLife embeddings. Open In Colab

Documentation

For subsetting the TreeOfLifeClassifier see get_label_data(), create_taxa_filter() and apply_filter() .

Experiment with grad-cam

GradCamExperiment.ipynb applies GradCAM AI explainability to BioCLIP. Open In Colab

Fine-tune with SVM

FineTuneSVM.ipynb Fine-tunes BioCLIP by combining an SVM with BioCLIP image embeddings. Open In Colab As can be seen from comparing the confusion matrices, fine-tuning an SVM classifier on BioCLIP image embeddings may yield better results than using BioCLIP in "zero-shot mode", i.e., predicting on a list of custom labels.

This work is based on code from biobench.

PIL Images

The predict() functions used in all the examples above allow passing a list of paths or a list of PIL Images. When a list of PIL images is passed the index of the image will be filled in for file_name. This is because PIL images may not have an associated file name.