* [#856,openspending][xl]: initial commit * [examples/openspending][xs] - remove console.logs --------- Co-authored-by: Luccas Mateus de Medeiros Gomes <luccasmmg@gmail.com>
103 lines
3.5 KiB
Markdown
103 lines
3.5 KiB
Markdown
# A data catalog with data on GitHub
|
|
|
|
This example showcases a simple data catalog that get its data from a list of GitHub repos that serve as datasets.
|
|
|
|
A `datasets.json` file is used to specify which datasets are going to be part of the data catalog.
|
|
|
|
The application contains an index page, which lists all the datasets specified in the `datasets.json` file, and users can see more information about each dataset, such as the list of data files in it and the README, by clicking the "info" button on the list.
|
|
|
|
You can read more about it on the [Data catalog with data on GitHub](https://portaljs.org/docs/examples/github-backed-catalog) blog post.
|
|
|
|
## Demo
|
|
|
|
https://example.portaljs.org/
|
|
|
|
## Deploy your own
|
|
|
|
[](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fdatopian%2Fportaljs%2Ftree%2Fmain%2Fexamples%2Fgithub-backed-catalog)
|
|
|
|
By clicking on this button, you will be redirected to a page which will allow you to clone the content into your own GitHub/GitLab/Bitbucket account and automatically deploy everything.
|
|
|
|
## How to use
|
|
|
|
### Install
|
|
|
|
Execute `create-next-app` to bootstrap the example:
|
|
|
|
```
|
|
npx create-next-app <app-name> --example https://github.com/datopian/portaljs/tree/main/examples/github-backed-catalog
|
|
cd <app-name>
|
|
```
|
|
|
|
### Set environment variables
|
|
|
|
This project uses the GitHub API, which for anonymous users will cap at 50 requests per hour, so you might want to get a [Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and add it to a `.env` file inside the folder like so
|
|
|
|
```
|
|
GITHUB_PAT=<github token>
|
|
```
|
|
|
|
### Change datasets
|
|
|
|
You can change the datasets that will be displayed in the data catalog by editing the file `datasets.json`. Some examples can be found inside [this repo](https://github.com/datasets).
|
|
|
|
### Run in development mode
|
|
|
|
Run the app using:
|
|
|
|
```
|
|
npm run dev
|
|
```
|
|
|
|
Open http://localhost:3000 from your browser. You should see something similar to this:
|
|
|
|

|
|
|
|
If click on the `info` button for a dataset you will see a page similar to this:
|
|
|
|

|
|
|
|
## Notes
|
|
|
|
### Structure of `datasets.json`
|
|
|
|
The `datasets.json` file is simply a list of datasets, below you can see a minimal example of a dataset:
|
|
|
|
```json
|
|
{
|
|
"owner": "fivethirtyeight",
|
|
"repo": "data",
|
|
"branch": "master",
|
|
"files": ["nba-raptor/historical_RAPTOR_by_player.csv", "nba-raptor/historical_RAPTOR_by_team.csv"],
|
|
"readme": "nba-raptor/README.md"
|
|
}
|
|
```
|
|
|
|
It has:
|
|
|
|
- A `owner` which is going to be the github repo owner
|
|
- A `repo` which is going to be the github repo name
|
|
- A `branch` which is going to be the branch to which we need to get the files and the readme
|
|
- A list of `files` which is going to be a list of paths with files that you want to show to the world
|
|
- A `readme` which is going to be the path to your data description, it can also be a subpath eg: `example/README.md`
|
|
|
|
You can also add:
|
|
|
|
- A `description` which is useful if you have more than one dataset for each repo, if not provided we are just going to use the repo description
|
|
- A `Name` which is useful if you want to give your dataset a nice name, if not provided we are going to use the junction of the `owner` the `repo` + the path of the README, in the exaple above it will be `fivethirtyeight/data/nba-raptor`
|
|
|
|
### Extra commands
|
|
|
|
You can also build the project for production with:
|
|
|
|
```
|
|
npm run build
|
|
```
|
|
|
|
And run the production build with:
|
|
|
|
```
|
|
npm run start
|
|
```
|
|
|