FAIR Guide¶
This section provides information and resources to help ensure that digital products are Findable, Accessible, Interoperable, Reusable, and Reproducible1. A general Metadata Checklist is provided to stimulate thinking about the type of information to be collected. Additionally, we include checklists for code, data, and model repositories. The code checklist focuses on the contents of a well-documented GitHub repository, while the data and model checklists cover the content of the data and model card templates, respectively.
Each checklist was developed following the FAIR principles (as defined by the Go-FAIR Initiative). They provide a detailed outline of tasks and files to include to ensure alignment with the FAIR principles, and are complementary to the descriptions provided within the GitHub and Hugging Face Guides presented on this site. As with the contents of these Guides, these checklists are based on a combination of existing guides (e.g., The Turing Way, the Model Card Guidebook, and the Dataset Card Creation Guide) and the experiences of our team. Following these checklists ensures digital products are aligned with FAIR principles and a best-effort toward reproducibility.2
Pro tip
Use the eye icon at the top of any checklist page to access the source and copy the markdown for the checklist into an issue on your GitHub Repo or Project so you can check the boxes as you add each. When added to the main description of the issue, the issue summary will show x out of total components completed for that issue.
The last topic in this section discusses different methods of DOI Generation for digital products (code, data, and models). It focuses on our selected method for dataset publication: Hugging Face, with some guidance on using Zenodo to archive code (specifically, a GitHub repository). For more information about other common data publication venues—and to see the thought process behind our selection—see the Data Archive Options Comparative Overview for more information.3 Generating a DOI for a digital product is part of ensuring a globally unique and persistent identifier that can be used to reference and refer back to a digital product—an important component of FAIR and Reproducible principles.
References and Background
If you want to learn more about FAIR and Reproducible principles, explore these resources that we used when developing this guide:
-
The Turing Way: an open-source, community data science handbook. It provides a strong foundation on the guiding principles for this Guide, providing accessible explanations and overviews of topics from reproducibility, to collaboration and communication, to project design, to ethical research.
This is a particularly good resource for those just starting to use
git
and GitHub. It builds motivation for use of version control through the lens of reproducibility. -
Go-FAIR Initiative: The FAIR Principles
-
Ozoani, Ezi and Gerchick, Marissa and Mitchell, Margaret. Model Card Guidebook. Hugging Face, 2022. https://huggingface.co/docs/hub/en/model-card-guidebook.
The authors also provide a nice summary of related work, including Datasheets for Datasets (Gebru, et al., 2018) and The Dataset Nutrition Label (label, paper).
-
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). 10.1038/sdata.2016.18
- Barker, M., Chue Hong, N.P., Katz, D.S. et al. Introducing the FAIR Principles for research software. Sci Data 9, 622 (2022). 10.1038/s41597-022-01710-x
- Balk, M. A., Bradley, J., Maruf, M., Altintaş, B., Bakiş, Y., Bart, H. L. Jr, Breen, D., Florian, C. R., Greenberg, J., Karpatne, A., Karnani, K., Mabee, P., Pepper, J., Jebbia, D., Tabarin, T., Wang, X., & Lapp, H. (2024). A FAIR and modular image-based workflow for knowledge discovery in the emerging field of imageomics. Methods in Ecology and Evolution, 15, 1129–1145. 10.1111/2041-210X.14327
- The FARR Research Coordination Network has a number of interesting resources and events.
- The Research Data Aliance for Interdisciplinary Research also provides links to resources and events particularly focused on considerations in interdisciplinary research.
-
While "Reproducible" is not part of the original FAIR principles as defined by the Go-FAIR Initiative, we include it here to emphasize the importance of computational reproducibility alongside data stewardship. This extension reflects emerging practice in data-intensive science, where code, models, and workflows must be reusable and verifiable to support robust scientific claims. It is not part of the formal FAIR acronym, but aligns with broader community goals for open and transparent research. ↩
-
Full reproducibility is difficult to achieve; this presentation by Odd Erik Gundersen provides a discussion of the varying degrees of reproducibilityand useful references when considering the level of reproducibility achieved by a given project. ↩
-
The Data Archive Options Comparative Overview was created in May 2023 as part of developing archive recommendations for the Institute, so it does not include information about newer features such as Hugging Face's dataset viewer, which greatly simplifies previewing datasets for downstream users. ↩