Skip to content

Command Line Help

bioclip predict

Use BioCLIP to generate predictions for image files.

usage: bioclip predict [-h] [--format {table,csv}] [--output OUTPUT]
                       [--rank {kingdom,phylum,class,order,family,genus,species} |
                        --cls CLS | --bins BINS | --subset SUBSET] [--k K]
                       [--device DEVICE] [--model MODEL] [--pretrained PRETRAINED]
                       [--batch-size BATCH_SIZE]
                       image_file [image_file ...]

positional arguments:
  image_file            input image file(s)

options:
  -h, --help            show this help message and exit
  --format {table,csv}  format of the output, default: csv
  --output OUTPUT       print output to file, default: stdout
  --rank {kingdom,phylum,class,order,family,genus,species}
                        rank of the classification, default: species, when
                        specified the --cls, --bins, and --subset arguments
                        are not allowed.
  --cls CLS             classes to predict: either a comma separated list or a
                        path to a text file of classes (one per line), when
                        specified the --rank, --bins, and --subset arguments
                        are not allowed.
  --bins BINS           path to CSV file with two columns with the first being
                        classes and second being bin names, when specified the
                        --rank, --cls, and --subset arguments are not allowed.
  --subset SUBSET       path to CSV file used to subset the TreeOfLife taxa
                        embeddings. CSV first column must be named one of
                        kingdom,phylum,class,order,family,genus,species. When
                        specified the --rank, --bins, and --cls arguments are
                        not allowed.
  --k K                 number of top predictions to show, default: 5
  --device DEVICE       device to use (cpu or cuda or mps), default: cpu
  --model MODEL         model identifier (see command list-models);
                        default: hf-hub:imageomics/bioclip-2
  --pretrained PRETRAINED
                        pretrained model checkpoint as tag or file, depends on
                        model; needed only if more than one is available
                        (see command list-models)
  --batch-size BATCH_SIZE
                        Number of images to process in a batch, default: 10
  --log LOG_FILE        Path to a file for recording prediction logs.
                        If the file extension is '.json', the log is written
                        in JSON for building a provenance chain; otherwise, 
                        logs are appended in a human-readable text format.
                        If not specified, no log is written.

bioclip embed

Use BioCLIP to generate embeddings for image files.

usage: bioclip embed [-h] [--output OUTPUT] [--device DEVICE] [--model MODEL]
                     [--pretrained PRETRAINED] image_file [image_file ...]

positional arguments:
  image_file            input image file(s)

options:
  -h, --help            show this help message and exit
  --output OUTPUT       print output to file, default: stdout
  --device DEVICE       device to use (cpu or cuda or mps), default: cpu
  --model MODEL         model identifier (see command list-models);
                        default: hf-hub:imageomics/bioclip-2
  --pretrained PRETRAINED
                        pretrained model checkpoint as tag or file, depends
                        on model; needed only if more than one is available
                        (see command list-models)

bioclip list-models

List available models and pretrained model checkpoints.

usage: bioclip list-models [-h] [--model MODEL]

Note that this will only list models known to open_clip; any model identifier
loadable by open_clip, such as from hf-hub, file, etc should also be usable for
--model in the embed and predict commands.
(The default model hf-hub:imageomics/bioclip-2 is one example.)

options:
  -h, --help     show this help message and exit
  --model MODEL  list available tags for pretrained model checkpoint(s) for
                 specified model

bioclip list-tol-taxa

Outputs a CSV of the taxa embedding labels included with the selected (or default) TreeOfLife model. Other models are not supported (because precomputed taxon label embeddings are not available for them).

Note that this is a very large table and should be redirected to a file. One major use of this table is to construct and/or validate a table for the --subset option of the predict command. Because the TreeOfLife training datasets (TreeOfLife-10M for the original BioCLIP model, TreeOfLife-200M for BioCLIP 2) are very different between the models, their taxon embedding labels are also different (even if the intersection of taxa is large).

usage: bioclip list-tol-taxa [-h] [--model MODEL]

options:
  -h, --help     show this help message and exit
  --model MODEL  model identifier (see command list-models); default: hf-hub:imageomics/bioclip-2