Skip to content

biobench.plantnet

Pl@ntNet is a "dataset with high label ambiguity and a long-tailed distribution" from NeurIPS 2021. We fit a ridge classifier from scikit-learn to a backbone's embeddings and evaluate on the validation split.

There are two pieces that make Pl@ntNet more than a simple classification task:

  1. Because of the long tail, we use class_weight='balanced' which adjusts weights based on class frequency.
  2. We use macro F1 both to choose the alpha parameter and to evaluate the final classifier rather than accuracy due to the massive class imbalance.

If you use this task, please cite the original paper:

@inproceedings{plantnet-300k, author={Garcin, Camille and Joly, Alexis and Bonnet, Pierre and Lombardo, Jean-Christophe and Affouard, Antoine and Chouet, Mathias and Servajean, Maximilien and Lorieul, Titouan and Salmon, Joseph}, booktitle={NeurIPS Datasets and Benchmarks 2021}, title={{Pl@ntNet-300K}: a plant image dataset with high label ambiguity and a long-tailed distribution}, year={2021}, }

Dataset(root, transform)

Bases: Dataset

Source code in src/biobench/plantnet/__init__.py
def __init__(self, root: str, transform):
    self.transform = transform
    self.samples = []
    if not os.path.exists(root) or not os.path.isdir(root):
        msg = f"Path '{root}' doesn't exist. Did you download the Pl@ntNet dataset? See the docstring at the top of this file for instructions. If you did download it, pass the path as --dataset-dir PATH"
        raise RuntimeError(msg)

    for dirpath, dirnames, filenames in os.walk(root):
        img_class = os.path.relpath(dirpath, root)
        for filename in filenames:
            img_id = filename.removesuffix(".jpg")
            img_path = os.path.join(dirpath, filename)
            self.samples.append((img_id, img_path, img_class))

samples = [] instance-attribute

List of all image ids, image paths, and classnames.

transform = transform instance-attribute

Optional function function that transforms an image into a format expected by a neural network.

benchmark(cfg)

Steps: 1. Get features for all images. 2. Select lambda using cross validation splits. 3. Report score on test data.

Source code in src/biobench/plantnet/__init__.py
@beartype.beartype
def benchmark(cfg: config.Experiment) -> reporting.Report:
    """
    Steps:
    1. Get features for all images.
    2. Select lambda using cross validation splits.
    3. Report score on test data.
    """
    backbone = registry.load_vision_backbone(cfg.model)

    # 1. Get features
    val_features = get_features(cfg, backbone, split="val")
    train_features = get_features(cfg, backbone, split="train")
    torch.cuda.empty_cache()

    encoder = sklearn.preprocessing.OrdinalEncoder()
    all_labels = np.concatenate((val_features.labels, train_features.labels))
    encoder.fit(all_labels.reshape(-1, 1))

    # 2. Fit model.
    clf = init_clf(cfg)
    clf.fit(train_features.x, train_features.y(encoder))

    true_labels = val_features.y(encoder)
    pred_labels = clf.predict(val_features.x)

    preds = [
        reporting.Prediction(
            str(img_id),
            float(pred == true),
            {"y_pred": pred.item(), "y_true": true.item()},
        )
        for img_id, pred, true in zip(val_features.ids, pred_labels, true_labels)
    ]

    return reporting.Report("plantnet", preds, cfg)