datahub/site/content/docs/examples/github-backed-catalog.md

3.4 KiB

title, authors, date, filetype
title authors date filetype
Example: Data catalog with data on GitHub
Luccas Mateus
2023-04-20 blog

The github-backed example added to PortalJS is intended to provide users with an easy way to set up a data catalog that can be used to display and share data stored in GitHub repositories. With this example, users can quickly set up a web-based portal that allows them to showcase their data and make it accessible to others, all this being done thru the configuration of a simple datasets.json file.

Demo

To get a feel of the project, users can check the live deployment.

Below are some screenshots:

Front page

Individual dataset page

How to use this example as a template

  • Create a new app with create-next-app:
npx create-next-app <app-name> --example https://github.com/datopian/portaljs/tree/main/examples/github-backed-catalog
cd <app-name>
  • This project uses the github api, which for anonymous users will cap at 50 requests per hour, so you might want to get a Personal Access Token and add it to a .env file inside the folder like so
GITHUB_PAT=<github token>
  • Edit the file datasets.json to your liking, some examples can be found inside this repo
  • Run the app using:
npm run dev

Congratulations, you now have something similar to this running on http://localhost:3000

Deployment

Deploy with Vercel

By clicking on this button, you will be redirected to a page which will allow you to clone the content into your own github/gitlab/bitbucket account and automatically deploy everything.

Structure of datasets.json

The datasets.json file is simply a list of datasets, below you can see a minimal example of a dataset

{
  "owner": "fivethirtyeight",
  "repo": "data",
  "branch": "master",
  "files": ["nba-raptor/historical_RAPTOR_by_player.csv", "nba-raptor/historical_RAPTOR_by_team.csv"],
  "readme": "nba-raptor/README.md"
}

It has

  • A owner which is going to be the github repo owner
  • A repo which is going to be the github repo name
  • A branch which is going to be the branch to which we need to get the files and the readme
  • A list of files which is going to be a list of paths with files that you want to show to the world
  • A readme which is going to be the path to your data description, it can also be a subpath eg: example/README.md

You can also add

  • A description which is useful if you have more than one dataset for each repo, if not provided we are just going to use the repo description
  • A Name which is useful if you want to give your dataset a nice name, if not provided we are going to use the junction of the owner the repo + the path of the README, in the exaple above it will be fivethirtyeight/data/nba-raptor

Extra commands

You can also build the project for production with

npm run build

And run using the production build like so:

npm run start