Python API
bioclip.TreeOfLifeClassifier(**kwargs)
Bases: BaseClassifier
A classifier for predicting taxonomic ranks for images.
See BaseClassifier
for details on **kwargs
.
Source code in src/bioclip/predict.py
484 485 486 487 488 489 490 491 492 |
|
predict(images, rank, min_prob=1e-09, k=5, batch_size=10)
Predicts probabilities for supplied taxa rank for given images using the Tree of Life embeddings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
List[str] | str | List[Image]
|
A list of image file paths, a single image file path, or a list of PIL Image objects. |
required |
rank
|
Rank
|
The rank at which to make predictions (e.g., species, genus). |
required |
min_prob
|
float
|
The minimum probability threshold for predictions. |
1e-09
|
k
|
int
|
The number of top predictions to return. |
5
|
batch_size
|
int
|
The number of images to process in a batch. |
10
|
Returns:
Type | Description |
---|---|
dict[str, dict[str, float]]
|
List[dict]: A list of dicts with keys "file_name", taxon ranks, "common_name", and "score". |
Source code in src/bioclip/predict.py
630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 |
|
get_label_data()
Retrieves label data for the tree of life embeddings as a pandas DataFrame.
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: A DataFrame containing label data for TOL embeddings. |
Source code in src/bioclip/predict.py
508 509 510 511 512 513 514 515 516 517 518 519 |
|
create_taxa_filter(rank, user_values)
Creates a filter for taxa based on the specified rank and user-provided values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rank
|
Rank
|
The taxonomic rank to filter by. |
required |
user_values
|
List[str]
|
A list of user-provided values to filter the taxa. |
required |
Returns:
Type | Description |
---|---|
List[bool]
|
List[bool]: A list of boolean values indicating whether each entry in the label data matches any of the user-provided values. |
Raises:
Type | Description |
---|---|
ValueError
|
If any of the user-provided values are not found in the label data for the specified taxonomic rank. |
Source code in src/bioclip/predict.py
521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 |
|
apply_filter(keep_labels_ary)
Filters the TOL embeddings based on the provided boolean array. See create_taxa_filter()
for an easy way to create this parameter.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keep_labels_ary
|
List[bool]
|
A list of boolean values indicating which TOL embeddings to keep. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the length of keep_labels_ary does not match the expected length. |
Source code in src/bioclip/predict.py
574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 |
|
bioclip.Rank
Rank for the Tree of Life classification.
KINGDOM
PHYLUM
CLASS
ORDER
FAMILY
GENUS
SPECIES
bioclip.CustomLabelsClassifier(cls_ary, **kwargs)
Bases: BaseClassifier
A classifier that predicts from a list of custom labels for images.
Initializes the classifier with the given class array and additional keyword arguments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls_ary
|
List[str]
|
A list of class names as strings. |
required |
Source code in src/bioclip/predict.py
342 343 344 345 346 347 348 349 350 351 352 |
|
predict(images, k=None, batch_size=10)
Predicts the probabilities for the given images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images
|
List[str] | str | List[Image]
|
A list of image file paths, a single image file path, or a list of PIL Image objects. |
required |
k
|
int
|
The number of top probabilities to return. If not specified or if greater than the number of classes, all probabilities are returned. |
None
|
batch_size
|
int
|
The number of images to process in a batch. |
10
|
Returns:
Type | Description |
---|---|
dict[str, float]
|
List[dict]: A list of dicts with keys "file_name" and the custom class labels. |
Source code in src/bioclip/predict.py
367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 |
|
bioclip.CustomLabelsBinningClassifier(cls_to_bin, **kwargs)
Bases: CustomLabelsClassifier
A classifier that creates predictions for images based on custom labels, groups the labels, and calculates probabilities for each group.
Initializes the class with a dictionary mapping class labels to binary values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls_to_bin
|
dict
|
A dictionary where keys are class labels and values are binary values. |
required |
**kwargs
|
Additional keyword arguments passed to the superclass initializer. |
{}
|
Raises:
Type | Description |
---|---|
ValueError
|
If any value in |
Source code in src/bioclip/predict.py
411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 |
|
bioclip.predict.BaseClassifier(model_str=BIOCLIP_MODEL_STR, pretrained_str=None, device='cpu')
Bases: Module
Initializes the prediction model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_str
|
str
|
The string identifier for the model to be used (defaults to BIOCLIP_MODEL_STR). |
BIOCLIP_MODEL_STR
|
pretrained_str
|
str
|
The string identifier for the pretrained model to be loaded. |
None
|
device
|
Union[str, device]
|
The device on which the model will be run. |
'cpu'
|
Source code in src/bioclip/predict.py
200 201 202 203 204 205 206 207 208 209 210 211 |
|
forward(x)
Given an input tensor representing multiple images, return probabilities for each class for each image. Args: x (torch.Tensor): Input tensor representing the multiple images. Returns: torch.Tensor: Softmax probabilities of the logits for each class for each image.
Source code in src/bioclip/predict.py
294 295 296 297 298 299 300 301 302 303 304 |
|
get_cached_datafile(filename)
Downloads a datafile from the Hugging Face hub and caches it locally. Args: filename (str): The name of the file to download from the datafile repository. Returns: str: The local path to the downloaded file.
Source code in src/bioclip/predict.py
306 307 308 309 310 311 312 313 314 |
|
get_txt_emb()
Retrieves TreeOfLife text embeddings for the current model from the associated Hugging Face dataset repo. Returns: torch.Tensor: A tensor containing the text embeddings for the tree of life.
Source code in src/bioclip/predict.py
316 317 318 319 320 321 322 323 |
|
get_txt_names()
Retrieves TreeOfLife text names for the current model from the associated Hugging Face dataset repo. Returns: List[List[str]]: A list of lists, where each inner list contains names corresponding to the text embeddings.
Source code in src/bioclip/predict.py
325 326 327 328 329 330 331 332 333 334 |
|