Code Checklist¶
This checklist provides an overview of essential and recommended elements to include in a GitHub repository to ensure that it conforms to FAIR principles and best practices for reproducibility. Along with the generation of a DOI (see DOI Generation and Digital Products Release and Licensing Policy), following this checklist ensures compliance with the FAIR Principles for research software.1
Pro tip
Use the eye icon at the top of this page to access the source and copy the markdown for the checklist below into an issue on your GitHub Repo or Project so you can check the boxes as you add each element to your GitHub repository.
Required Files¶
- License: Verify and include an appropriate license (e.g.,
MIT
,CC0-1.0
, etc.). See discussion in the Repo Guide. - README File: Following the Repo Guide, provide a detailed
README.md
with:- Overview of the project.
- Installation instructions.
- Basic usage examples.
- Links to related/created dataset(s).
- Links to related/created model(s).
- Acknowledge source code dependencies and contributors.
- Reference related datasets used in training or evaluation.
- Requirements File: Provide a file detailing software requirements, such as a
requirements.txt
orpyproject.toml
for Python dependencies. - Gitignore File: GitHub has premade
.gitignore
files (here) tailored to particular languages (eg., R or Python), operating systems, etc. - CITATION CFF: This facilitates citation of your work, follow guidance provided in the Repo Guide.
Data-Related¶
- Preprocessing code.
- Description of dataset(s), including description of training and testing sets (with links to relevant portions of dataset card, which will have more information).
Model-Related¶
- Training code.
- Inference/evaluation code.
- Model weights (if not in Hugging Face model repository).
- Description of model(s)/benchmark(s).
- Explanation of training and testing (with links to relevant portions of model card, which will have more information).
Note
The bioclip GitHub repository provides an example of incorporating data-and model-related code into a GitHub repository as published open-source code for both data and model development.
General Information¶
- Repository Structure: Ensure the code repository follows a clear and logical directory structure. (See Repo Guide.)
- Code Comments: Include meaningful inline comments and function descriptions for clarity.
- Random Seed Control: Save seed(s) for random number generator(s) to ensure reproducible results.
Security Considerations¶
- Sensitive Data Handling: Ensure no hardcoded sensitive information (e.g., API keys, credentials) are included in your repository. These can be shared through a config file on OSC.
Note
The best practices described below will help you meet the above requirements. The more advanced development practices noted further down are included for educational purposes and are highly recommended—though these may go beyond what is expected for a given project, we advise collaborators to at least have a discussion about the topics covered in Code Quality and whether other practices discussed would be appropriate for their project.
Best Practices¶
The Repo Guide provides general guidance on repository structure, collaborative workflow, and how to make and review pull requests (PR). Below, we highlight some best practices in checklist form to help you meet the requirements described above for a FAIR and Reproducible project.
Reproducibility¶
- Version Control: Use Git for version control and commit regularly.
- Modularization: Structure code into reusable and independent modules.
- Code Execution: Provide Notebooks to demonstrate how to reproduce results.
Code Review & Maintenance¶
- Code Reviews: Regular peer reviews for quality assurance. Refer to the GitHub PR Review Guide.
- Issue Tracking: Use GitHub issues for tracking bugs and feature requests.
- Versioning: Tag releases, changelogs can be auto-generated and informative when PRs are appropriately scoped.
Installation and Dependencies¶
- Environment Setup: Include setup instructions (e.g.,
conda
environment file,Dockerfile
). - Dependency Management: Use virtual environments and the frameworks that manage them (e.g.,
venv
,conda
,uv
for Python) to isolate dependencies.
More Advanced Development¶
Documentation¶
- API Documentation: Generate API documentation (e.g.,
MkDocs
for Python or wiki pages in the repo). - Docstrings: Add comprehensive docstrings for all functions, classes, and modules. These can be incorporated to help generate documentation. Note that generative AI tools with access to your code, such as GitHub Copilot, can be quite accurate in generating these, especially if you are using type annotations.
- Example Scripts: Include example scripts for common use cases.
- Configuration Files: Use
yaml
,json
, orini
for configuration settings.
Code Quality¶
- Consistent Style: Follow coding style guidelines (e.g.,
PEP 8
for Python). - Linting: Ensure the code passes a linter (e.g.,
Ruff
for Python). - Logging: Use logging instead of print statements for better debugging (e.g.,
logging
in Python). - Error Handling: Implement robust exception handling to avoid crashes or bogus results from input outside of code expectations.
Testing¶
- Unit Tests: Write unit tests to validate core functionality.
- Integration Tests: Ensure components work together correctly.
- Test Coverage: Check test coverage, e.g., using Coverage.
- Continuous Integration (CI): Set up CI/CD pipelines (e.g., GitHub Actions) for automated testing.
Code Distribution & Deployment¶
- Packaging: Provide installation instructions (e.g.,
setup.py
,hatch
,poetry
,uv
for Python). - Deployment Guide: Document deployment procedures