Python Tutorial
Before beginning this tutorial you need to install pybioclip and download two example images: Ursus-arctos.jpeg
and Felis-catus.jpeg
.
Predict species classification
from bioclip import TreeOfLifeClassifier, Rank
classifier = TreeOfLifeClassifier()
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
for prediction in predictions:
print(prediction["species"], "-", prediction["score"])
Output:
Ursus arctos - 0.9356034994125366
Ursus arctos syriacus - 0.05616999790072441
Ursus arctos bruinosus - 0.004126196261495352
Ursus arctus - 0.0024959812872111797
Ursus americanus - 0.0005009894957765937
Output from the predict()
method showing the dictionary structure:
[{
'kingdom': 'Animalia',
'phylum': 'Chordata',
'class': 'Mammalia',
'order': 'Carnivora',
'family': 'Ursidae',
'genus': 'Ursus',
'species_epithet': 'arctos',
'species': 'Ursus arctos',
'common_name': 'Kodiak bear'
'score': 0.9356034994125366
}]
The output from the predict function can be converted into a pandas DataFrame like so:
import pandas as pd
from bioclip import TreeOfLifeClassifier, Rank
classifier = TreeOfLifeClassifier()
predictions = classifier.predict("Ursus-arctos.jpeg", Rank.SPECIES)
df = pd.DataFrame(predictions)
The first argument of the predict()
method supports both a single path or a list of paths.
Documentation
The TreeOfLifeClassifier docs contains details about the arguments supported by the constructor and the predict()
method.
Predict from a list of classes
from bioclip import CustomLabelsClassifier
classifier = CustomLabelsClassifier(["duck","fish","bear"])
predictions = classifier.predict("Ursus-arctos.jpeg")
for prediction in predictions:
print(prediction["classification"], prediction["score"])
duck 1.0306726583309e-09
fish 2.932403668845507e-12
bear 1.0
Documentation
The CustomLabelsClassifier docs contains details about the arguments supported by the constructor and the predict()
method.
Predict using a Custom Model
To predict with a custom model the model_str
and pretrained_str
arguments must be specified.
In this example the CLIP-ViT-B-16-laion2B-s34B-b88K model is used.
from bioclip import CustomLabelsClassifier
classifier = CustomLabelsClassifier(
cls_ary = ["duck","fish","bear"],
model_str='ViT-B-16',
pretrained_str='laion2b_s34b_b88k')
print(classifier.predict("Ursus-arctos.jpeg"))
See this tutorial for instructions for listing available pretrained models.
Predict from a list of classes with binning
from bioclip import CustomLabelsBinningClassifier
classifier = CustomLabelsBinningClassifier(cls_to_bin={
'dog': 'small',
'fish': 'small',
'bear': 'big',
})
predictions = classifier.predict("Ursus-arctos.jpeg")
for prediction in predictions:
print(prediction["classification"], prediction["score"])
big 0.99992835521698
small 7.165559509303421e-05
Documentation
The CustomLabelsBinningClassifier documentation describes all arguments supported by the constructor. The base class CustomLabelsClassifier docs describes arguments for the predict method.
Example Notebooks
Predict species for images
PredictImages.ipynb downloads some images and predicts species.
Predict species for iNaturalist images
iNaturalistPredict.ipynb downloads images from inaturalist.org and predicts species.
Predict using a subset of the TreeOfLife
TOL-Subsetting.ipynb filters the TreeOfLife embeddings.
Documentation
For subsetting the TreeOfLifeClassifier see get_label_data(), create_taxa_filter() and apply_filter() .
Experiment with grad-cam
GradCamExperiment.ipynb applies GradCAM AI explainability to BioCLIP.
Fine-tune with SVM
FineTuneSVM.ipynb Fine-tunes BioCLIP by combining an SVM with BioCLIP image embeddings. As can be seen from comparing the confusion matrices, fine-tuning an SVM classifier on BioCLIP image embeddings may yield better results than using BioCLIP in "zero-shot mode", i.e., predicting on a list of custom labels.
This work is based on code from biobench.
PIL Images
The predict() functions used in all the examples above allow passing a list of paths or a list of PIL Images.
When a list of PIL images is passed the index of the image will be filled in for file_name
. This is because PIL images may not have an associated file name.