Python API
bioclip.TreeOfLifeClassifier(**kwargs)
Bases: BaseClassifier
A classifier for predicting taxonomic ranks for images.
See BaseClassifier
for details on **kwargs
.
Source code in src/bioclip/predict.py
433 434 435 436 437 438 439 440 441 |
|
predict(images, rank, min_prob=1e-09, k=5, batch_size=10)
Predicts probabilities for supplied taxa rank for given images using the Tree of Life embeddings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
List[str] | str | List[Image]
|
A list of image file paths, a single image file path, or a list of PIL Image objects. |
required |
rank |
Rank
|
The rank at which to make predictions (e.g., species, genus). |
required |
min_prob |
float
|
The minimum probability threshold for predictions. |
1e-09
|
k |
int
|
The number of top predictions to return. |
5
|
batch_size |
int
|
The number of images to process in a batch. |
10
|
Returns:
Type | Description |
---|---|
dict[str, dict[str, float]]
|
List[dict]: A list of dicts with keys "file_name", taxon ranks, "common_name", and "score". |
Source code in src/bioclip/predict.py
579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 |
|
get_label_data()
Retrieves label data for the tree of life embeddings as a pandas DataFrame.
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing label data for TOL embeddings. |
Source code in src/bioclip/predict.py
457 458 459 460 461 462 463 464 465 466 467 468 |
|
create_taxa_filter(rank, user_values)
Creates a filter for taxa based on the specified rank and user-provided values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rank |
Rank
|
The taxonomic rank to filter by. |
required |
user_values |
List[str]
|
A list of user-provided values to filter the taxa. |
required |
Returns:
Type | Description |
---|---|
List[bool]
|
List[bool]: A list of boolean values indicating whether each entry in the label data matches any of the user-provided values. |
Raises:
Type | Description |
---|---|
ValueError
|
If any of the user-provided values are not found in the label data for the specified taxonomic rank. |
Source code in src/bioclip/predict.py
470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 |
|
apply_filter(keep_labels_ary)
Filters the TOL embeddings based on the provided boolean array. See create_taxa_filter()
for an easy way to create this parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keep_labels_ary |
List[bool]
|
A list of boolean values indicating which TOL embeddings to keep. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the length of keep_labels_ary does not match the expected length. |
Source code in src/bioclip/predict.py
523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 |
|
bioclip.Rank
Rank for the Tree of Life classification.
KINGDOM
PHYLUM
CLASS
ORDER
FAMILY
GENUS
SPECIES
bioclip.CustomLabelsClassifier(cls_ary, **kwargs)
Bases: BaseClassifier
A classifier that predicts from a list of custom labels for images.
Initializes the classifier with the given class array and additional keyword arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls_ary |
List[str]
|
A list of class names as strings. |
required |
Source code in src/bioclip/predict.py
291 292 293 294 295 296 297 298 299 300 301 |
|
predict(images, k=None, batch_size=10)
Predicts the probabilities for the given images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
List[str] | str | List[Image]
|
A list of image file paths, a single image file path, or a list of PIL Image objects. |
required |
k |
int
|
The number of top probabilities to return. If not specified or if greater than the number of classes, all probabilities are returned. |
None
|
batch_size |
int
|
The number of images to process in a batch. |
10
|
Returns:
Type | Description |
---|---|
dict[str, float]
|
List[dict]: A list of dicts with keys "file_name" and the custom class labels. |
Source code in src/bioclip/predict.py
316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 |
|
bioclip.CustomLabelsBinningClassifier(cls_to_bin, **kwargs)
Bases: CustomLabelsClassifier
A classifier that creates predictions for images based on custom labels, groups the labels, and calculates probabilities for each group.
Initializes the class with a dictionary mapping class labels to binary values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls_to_bin |
dict
|
A dictionary where keys are class labels and values are binary values. |
required |
**kwargs |
Additional keyword arguments passed to the superclass initializer. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If any value in |
Source code in src/bioclip/predict.py
360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 |
|
bioclip.predict.BaseClassifier(model_str=BIOCLIP_MODEL_STR, pretrained_str=None, device='cpu')
Bases: Module
Initializes the prediction model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_str |
str
|
The string identifier for the model to be used. |
BIOCLIP_MODEL_STR
|
pretrained_str |
str
|
The string identifier for the pretrained model to be loaded. |
None
|
device |
Union[str, device]
|
The device on which the model will be run. |
'cpu'
|
Source code in src/bioclip/predict.py
179 180 181 182 183 184 185 186 187 188 189 190 |
|
forward(x)
Given an input tensor representing multiple images, return probabilities for each class for each image. Args: x (torch.Tensor): Input tensor representing the multiple images. Returns: torch.Tensor: Softmax probabilities of the logits for each class for each image.
Source code in src/bioclip/predict.py
273 274 275 276 277 278 279 280 281 282 283 |
|