Hugging Face Dataset Guide¶
Create a New Dataset Repository¶
When creating a new dataset repository, you can make the dataset Public (accessible to anyone on the internet) or Private (accessible only to members of the organization).
Upload a Dataset with the Web Interface¶
In the Files and versions tab of the Dataset card, you can choose to add file in the hugging web interface.
Upload a Dataset with HfApi¶
Upload a Dataset with Git¶
If the Dataset is Less Than 5GB¶
Navigate to the folder for the repository:
# Clone the repository
git clone https://huggingface.co/datasets/username/repo-name
# Add, commit, and push the files
git add
git commit -m 'comments'
git push
If the Dataset is Larger Than 5GB¶
Install Git LFS¶
Follow instructions at https://git-lfs.com/
Install the Hugging Face CLI¶
Enable the repository to upload large files¶
Initialize Git LFS¶
Track large files (e.g., .csv files)¶
# Adds a line to .gitattributes, which Git uses to determine files managed by LFS
git lfs track "*.csv"
git add .gitattributes
git commit -m "Track large files with Git LFS"