How to add a data set
TODO: Explain how to add a dataset
Step by Step guide to adding your dataset to the DLAB organization
1) Create repository in organization
- Go to the DLAB organization page on GitHub.
- Click the “Repositories” tab.
- Click the green “New” button (top right).
- Make the repo public.
- Give the repo a descriptive name and create it.
2) Metadata.json file follows schema 1 file template
- Go to the datalake repository.
- Locate the file
schema_1.0.json— this is your metadata template. - In your own dataset repo:
- Click the “Add File” dropdown near the green ”<> Code” button.
- Select “Create new file”.
- Name it
metadata.json.
- Copy the contents of
schema_1.0.jsoninto your newmetadata.json. - Replace the placeholder values with your dataset’s real information.
- Click “Commit new file” to save it to the repo.
3) Steps to set up git sub modular repo
- Platform to run git commands
- CMD
- Git Bash
- Github Desktop
- VS Code
- GUIDE FOR: Open terminal or Powershell in VS Code
- Clone your main repo to your local devices
- Run these commands:
- git clone URL
- cd REPONAME
- ex) git clone https://github.com/sadak2004/gitsubmodulerepo.git
- cd gitsubmodulerepo
- Now add the second repo as a submodule repo inside a folder.
- Run these commands:
- Git submodule add URLSUBREPO
- ex) git submodule add https://github.com/sadak2004/subrepo.git
- For adding inside a folder:
- ex) git submodule add https://github.com/sadak2004/subrepo.git FOLDERNAME/REPONAME
- Commit and push the changes.
- Run these commands:
- git add .gitmodules REPONAME
- git commit -m “COMMIT MESSAGE”
- git push origin main
- For further help/information: visit this link https://gist.github.com/gitaarik/8735255