Quick Reference
Install
pip install taxonopy
For detailed setup instructions including GNVerifier and troubleshooting, see Installation.
Sample Input
Click one of the links below to download the sample dataset (in either parquet or CSV format), then place it in examples/input/:
Sample input: Note the divergence in kingdoms (Metazoa vs Animalia), missing interior ranks, and fully null entry.
| uuid | kingdom | phylum | class | order | family | genus | species | scientific_name |
|---|---|---|---|---|---|---|---|---|
| bc2a3f9f-c1f9-48df-9b01-d045475b9d5f | Metazoa | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens | Homo sapiens |
| 21ed76d8-9a3b-406e-a1a3-ef244422bf8e | Plantae | Tracheophyta | null |
Fagales | Fagaceae | Quercus | Quercus alba | Quercus alba |
| 4d166a61-b6e5-4709-91ba-b623111014e9 | Animalia | null |
null |
Hymenoptera | Apidae | Apis | Apis mellifera | Apis mellifera |
| 85b96dc2-70ab-446e-afb5-6a4b92b0a450 | null |
null |
null |
null |
null |
null |
Amanita muscaria | null |
| 38327554-ffbf-4180-b4cf-63c311a26f4e | Animalia | null |
null |
null |
null |
null |
Laelia rosea | null |
| 8f688a17-1f7a-42b2-b3dc-bd4c8fc0eee3 | Plantae | null |
null |
null |
null |
null |
Laelia rosea | null |
| a95f3e29-ed48-41f4-9577-64d4243a0396 | null |
null |
null |
null |
null |
null |
null |
null |
In the final example entry, there is no available taxonomic data, which can happen in large datasets where there may be a corresponding image but incomplete annotation.
Execute a Basic Resolution
The command below will read in the sample data from examples/input/, execute resolution, and write the results to examples/resolved/.
taxonopy resolve --input examples/input --output-dir examples/resolved
Input values
There are three kinds of values you can pass to --input:
- A single file path (CSV or Parquet).
- A flat directory of partitioned files (TaxonoPy will glob everything inside).
- A directory tree (TaxonoPy will glob recursively and preserve the folder structure in the output).
In all three cases, the base filename is preserved in the output. That is, the output keeps the original filename(s) and adds .resolved / .unsolved before the extension.
If you download both sample.csv and sample.parquet into examples/input/, resolve will fail due to mixed input formats; keep only one format per input directory.
By default, outputs are written to Parquet format, whether the input is CSV or Parquet. To set the output format to CSV, use the --output-format csv flag.
The output files consist of:
sample.resolved.parquetsample.unsolved.parquetresolution_stats.json
The sample.resolved.parquet file contains all the entries where some resolution strategy was applied. In this example, it contains:
Sample resolved output (selected columns): Green highlights show values added during resolution. Yellow highlights indicate values that changed from the input.
| uuid | kingdom | phylum | class | order | family | genus | species | scientific_name |
|---|---|---|---|---|---|---|---|---|
| bc2a3f9f-c1f9-48df-9b01-d045475b9d5f | Animalia? | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens | Homo sapiens |
| 21ed76d8-9a3b-406e-a1a3-ef244422bf8e | Plantae | Tracheophyta | Magnoliopsida | Fagales | Fagaceae | Quercus | Quercus alba | Quercus alba |
| 4d166a61-b6e5-4709-91ba-b623111014e9 | Animalia | Arthropoda | Insecta | Hymenoptera | Apidae | Apis | Apis mellifera | Apis mellifera |
| 85b96dc2-70ab-446e-afb5-6a4b92b0a450 | Fungi | Basidiomycota | Agaricomycetes | Agaricales | Amanitaceae | Amanita | Amanita muscaria | "" |
| 38327554-ffbf-4180-b4cf-63c311a26f4e | Animalia | Arthropoda | Insecta | Lepidoptera | Erebidae | Laelia | Laelia rosea | "" |
| 8f688a17-1f7a-42b2-b3dc-bd4c8fc0eee3 | Plantae | Tracheophyta | Liliopsida | Asparagales | Orchidaceae | Laelia | Laelia rosea | "" |
Add Common Names
You can add vernacular names to resolved outputs as a post-processing step:
taxonopy common-names \
--resolved-dir examples/resolved \
--output-dir examples/resolved/common
This command uses GBIF Backbone data only and applies deterministic fallback: species to kingdom, with English names preferred at each rank.
Sample common-name output (examples/resolved/common/sample.resolved.parquet); the last two rows (both Laelia rosea) fall back to family-level common names—none available at species or genus rank.
| uuid | common_name | kingdom | phylum | class | order | family | genus | species |
|---|---|---|---|---|---|---|---|---|
| bc2a3f9f-c1f9-48df-9b01-d045475b9d5f | Human | Animalia | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens |
| 21ed76d8-9a3b-406e-a1a3-ef244422bf8e | Eastern White Oak | Plantae | Tracheophyta | Magnoliopsida | Fagales | Fagaceae | Quercus | Quercus alba |
| 4d166a61-b6e5-4709-91ba-b623111014e9 | Drone-Bee | Animalia | Arthropoda | Insecta | Hymenoptera | Apidae | Apis | Apis mellifera |
| 85b96dc2-70ab-446e-afb5-6a4b92b0a450 | Fly Agaric | Fungi | Basidiomycota | Agaricomycetes | Agaricales | Amanitaceae | Amanita | Amanita muscaria |
| 38327554-ffbf-4180-b4cf-63c311a26f4e | Underwing, Tiger, Tussock, And Allied Moths | Animalia | Arthropoda | Insecta | Lepidoptera | Erebidae | Laelia | Laelia rosea |
| 8f688a17-1f7a-42b2-b3dc-bd4c8fc0eee3 | Orchid | Plantae | Tracheophyta | Liliopsida | Asparagales | Orchidaceae | Laelia | Laelia rosea |
The sample.unsolved.parquet file contains entries that could not be resolved (for example, rows with no usable taxonomy information). In this example, it contains:
Sample unsolved output: Sequestered entries with no usable taxonomy information.
| uuid | kingdom | phylum | class | order | family | genus | species | scientific_name | common_name |
|---|---|---|---|---|---|---|---|---|---|
| a95f3e29-ed48-41f4-9577-64d4243a0396 | null |
null |
null |
null |
null |
null |
null |
null |
null |
The resolution_stats.json file summarizes counts of how many entries from the input fell into each final status across the resolved and unsolved files.
TaxonoPy also writes cache data to disk (default: ~/.cache/taxonopy) so it can trace provenance and avoid reprocessing. Use --show-cache-path, --cache-stats, or --clear-cache if you want to inspect or manage it, or see the Cache guide for details.
Trace an Entry
You can trace how a single UUID was resolved. For example, let's trace one of the Laelia rosea entries:
taxonopy trace entry \
--uuid 8f688a17-1f7a-42b2-b3dc-bd4c8fc0eee3 \
--from-input examples/input/sample.csv
TaxonoPy uses whatever rank context you provide (even if sparse) to disambiguate identical names. Laelia rosea resolves differently under Animalia vs. Plantae as a hemihomonym. If higher ranks were missing, TaxonoPy would not have been able to disambiguate.
Excerpt from the trace output:
{
"query_plan": {
"term": "Laelia rosea",
"rank": "species",
"source_id": 11
},
"resolution_attempts": [
{
"status": "EXACT_MATCH_PRIMARY_SOURCE_ACCEPTED_INNER_RANK_DISAMBIGUATION",
"resolution_strategy_name": "ExactMatchPrimarySourceAcceptedInnerRankDisambiguation",
"resolved_classification": {
"kingdom": "Plantae",
"phylum": "Tracheophyta",
"class_": "Liliopsida",
"order": "Asparagales",
"family": "Orchidaceae",
"genus": "Laelia",
"species": "Laelia rosea"
}
}
]
}