Quick Reference
Install
pip install taxonopy
For detailed setup instructions including GNVerifier and troubleshooting, see Installation.
Sample Input
Download the same sample dataset in either format and place it in examples/input/:
Sample input: Note the divergence in kingdoms (Metazoa vs Animalia), missing interior ranks, and fully null entry.
| uuid | kingdom | phylum | class | order | family | genus | species | scientific_name |
|---|---|---|---|---|---|---|---|---|
| bc2a3f9f-c1f9-48df-9b01-d045475b9d5f | Metazoa | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens | Homo sapiens |
| 21ed76d8-9a3b-406e-a1a3-ef244422bf8e | Plantae | Tracheophyta | null |
Fagales | Fagaceae | Quercus | Quercus alba | Quercus alba |
| 4d166a61-b6e5-4709-91ba-b623111014e9 | Animalia | null |
null |
Hymenoptera | Apidae | Apis | Apis mellifera | Apis mellifera |
| 85b96dc2-70ab-446e-afb5-6a4b92b0a450 | null |
null |
null |
null |
null |
null |
Amanita muscaria | null |
| 38327554-ffbf-4180-b4cf-63c311a26f4e | Animalia | null |
null |
null |
null |
null |
Laelia rosea | null |
| 8f688a17-1f7a-42b2-b3dc-bd4c8fc0eee3 | Plantae | null |
null |
null |
null |
null |
Laelia rosea | null |
| a95f3e29-ed48-41f4-9577-64d4243a0396 | null |
null |
null |
null |
null |
null |
null |
null |
In the final example entry, there is no available taxonomic data, which can happen in large datasets where there may be a corresponding image but incomplete annotation.
Execute a Basic Resolution
taxonopy resolve --input examples/input --output-dir examples/output
Input values
There are three kinds of values you can pass to --input:
- A single file path (CSV or Parquet).
- A flat directory of partitioned files (TaxonoPy will glob everything inside).
- A directory tree (TaxonoPy will glob recursively and preserve the folder structure in the output).
In all three cases, the base filename is preserved in the output. That is, the output keeps the original filename(s) and adds .resolved / .unsolved before the extension.
If you download both sample.csv and sample.parquet into examples/input/, resolve will fail due to mixed input formats; keep only one format per input directory.
The command above will read in the sample data from examples/input/, execute resolution, and write the results to examples/output/.
By default, outputs are written to Parquet format, whether the input is CSV or Parquet. To set the output format to CSV, use the --output-format csv flag.
The output files consist of:
sample.resolved.parquetsample.unsolved.parquetresolution_stats.json
The sample.resolved.parquet file contains all the entries where some resolution strategy was applied. In this example, it contains:
Sample resolved output (selected columns): Green highlights show values added during resolution. Yellow highlights indicate values that changed from the input.
| uuid | kingdom | phylum | class | order | family | genus | species | scientific_name | common_name |
|---|---|---|---|---|---|---|---|---|---|
| bc2a3f9f-c1f9-48df-9b01-d045475b9d5f | Animalia? | Chordata | Mammalia | Primates | Hominidae | Homo | Homo sapiens | Homo sapiens | null |
| 21ed76d8-9a3b-406e-a1a3-ef244422bf8e | Plantae | Tracheophyta | Magnoliopsida | Fagales | Fagaceae | Quercus | Quercus alba | Quercus alba | null |
| 4d166a61-b6e5-4709-91ba-b623111014e9 | Animalia | Arthropoda | Insecta | Hymenoptera | Apidae | Apis | Apis mellifera | Apis mellifera | null |
| 85b96dc2-70ab-446e-afb5-6a4b92b0a450 | Fungi | Basidiomycota | Agaricomycetes | Agaricales | Amanitaceae | Amanita | Amanita muscaria | null |
null |
| 38327554-ffbf-4180-b4cf-63c311a26f4e | Animalia | Arthropoda | Insecta | Lepidoptera | Erebidae | Laelia | Laelia rosea | null |
null |
| 8f688a17-1f7a-42b2-b3dc-bd4c8fc0eee3 | Plantae | Tracheophyta | Liliopsida | Asparagales | Orchidaceae | Laelia | Laelia rosea | null |
null |
The sample.unsolved.parquet file contains entries that could not be resolved (for example, rows with no usable taxonomy information). In this example, it contains:
Sample unsolved output: Sequestered entries with no usable taxonomy information.
| uuid | kingdom | phylum | class | order | family | genus | species | scientific_name | common_name |
|---|---|---|---|---|---|---|---|---|---|
| a95f3e29-ed48-41f4-9577-64d4243a0396 | null |
null |
null |
null |
null |
null |
null |
null |
null |
The resolution_stats.json file summarizes counts of how many entries from the input fell into each final status across the resolved and unsolved files.
TaxonoPy also writes cache data to disk (default: ~/.cache/taxonopy) so it can trace provenance and avoid reprocessing. Use --show-cache-path, --cache-stats, or --clear-cache if you want to inspect or manage it, or see the Cache guide for details.
Trace an Entry
You can trace how a single UUID was resolved. For example, let's trace one of the Laelia rosea entries:
taxonopy trace entry --uuid 8f688a17-1f7a-42b2-b3dc-bd4c8fc0eee3 --from-input examples/input/sample.csv
TaxonoPy uses whatever rank context you provide (even if sparse) to disambiguate identical names. Laelia rosea resolves differently under Animalia vs. Plantae as a hemihomonym. If higher ranks are missing, TaxonoPy would not have been able to disambiguate.
Excerpt (incomplete) from the trace output:
{
"query_plan": {
"term": "Laelia rosea",
"rank": "species",
"source_id": 11
},
"resolution_attempts": [
{
"status": "EXACT_MATCH_PRIMARY_SOURCE_ACCEPTED_INNER_RANK_DISAMBIGUATION",
"resolution_strategy_name": "ExactMatchPrimarySourceAcceptedInnerRankDisambiguation",
"resolved_classification": {
"kingdom": "Plantae",
"phylum": "Tracheophyta",
"class_": "Liliopsida",
"order": "Asparagales",
"family": "Orchidaceae",
"genus": "Laelia",
"species": "Laelia rosea"
}
}
]
}