Compare commits
61 Commits
ckan_examp
...
styling-si
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
23fd9524e3 | ||
|
|
926ae16c35 | ||
|
|
63ab0c4d3c | ||
|
|
a31b2e8fa3 | ||
|
|
5305cc4c2f | ||
|
|
e8bf4daf5f | ||
|
|
267267ac11 | ||
|
|
1770deb960 | ||
|
|
7002b5669c | ||
|
|
bfc124473d | ||
|
|
6e90f1897b | ||
|
|
8292aa567b | ||
|
|
37fb13f52c | ||
|
|
2e6c87062f | ||
|
|
a89dfaae38 | ||
|
|
a9940a41fe | ||
|
|
07d903e454 | ||
|
|
996568c0f9 | ||
|
|
cceb1b011e | ||
|
|
7684a89f55 | ||
|
|
6b2b5f5e87 | ||
|
|
279426dcaf | ||
|
|
f688dd855c | ||
|
|
ebb1bc09c4 | ||
|
|
ae833febdc | ||
|
|
064b234442 | ||
|
|
061a5dd171 | ||
|
|
800e868f6a | ||
|
|
b4ec63e1e0 | ||
|
|
2fe5cafc40 | ||
|
|
22b916ea37 | ||
|
|
23a0420fcb | ||
|
|
7039564187 | ||
|
|
b38ea26f82 | ||
|
|
110360ccae | ||
|
|
b0e80c610f | ||
|
|
cea6cd9186 | ||
|
|
ee38b125bf | ||
|
|
99af8ce9b8 | ||
|
|
65e2a8be4c | ||
|
|
92316d4680 | ||
|
|
7f62550c7a | ||
|
|
f0cf5728b2 | ||
|
|
96480f2017 | ||
|
|
809028cc4a | ||
|
|
c0d35fe530 | ||
|
|
17e7434c97 | ||
|
|
23da1d94c6 | ||
|
|
8d567288f3 | ||
|
|
1482f437cd | ||
|
|
4d7a0f7e38 | ||
|
|
0161df99f2 | ||
|
|
9cf6ccc884 | ||
|
|
3a3ac5ce4d | ||
|
|
342eabbb3d | ||
|
|
2dbfbbd552 | ||
|
|
ac70edc8dd | ||
|
|
8c8674c4ef | ||
|
|
e26ee8ea1e | ||
|
|
dce8b97a76 | ||
|
|
6e53942125 |
6
.gitignore
vendored
6
.gitignore
vendored
@@ -39,4 +39,8 @@ testem.log
|
||||
Thumbs.db
|
||||
|
||||
# Next.js
|
||||
.next
|
||||
.next
|
||||
|
||||
# Env
|
||||
.env
|
||||
**/.env
|
||||
|
||||
815
README.md
815
README.md
@@ -1,5 +1,3 @@
|
||||
> :warning: **This documentation is outdated**: In the coming months this repo has been and will continue to experience a major revamping, this is all in the effort of modernizing and expanding the framework, with that said, not everything shown in the documentation below is going to still be aplicable so thread carefully
|
||||
|
||||
<h1 align="center">
|
||||
🌀 Portal.JS
|
||||
<br />
|
||||
@@ -9,821 +7,48 @@ Rapidly build rich data portals using a modern frontend framework
|
||||
* [What is Portal.JS ?](#What-is-Portal.JS)
|
||||
* [Features](#Features)
|
||||
* [For developers](#For-developers)
|
||||
* [Installation and setup](#Installation-and-setup)
|
||||
* [Getting Started](#Getting-Started)
|
||||
* [Tutorial](#Tutorial)
|
||||
* [Build a single Frictionless dataset portal](#Build-a-single-Frictionless-dataset-portal)
|
||||
* [Build a CKAN powered dataset portal](#Build-a-CKAN-powered-dataset-portal)
|
||||
* [Architecture / Reference](#Architecture--Reference)
|
||||
* [Component List](#Component-List)
|
||||
* [UI Components](#UI-Components)
|
||||
* [Dataset Components](#Dataset-Components)
|
||||
* [View Components](#View-Components)
|
||||
* [Search Components](#Search-Components)
|
||||
* [Blog Components](#Blog-Components)
|
||||
* [Misc Components](#Misc-Components)
|
||||
* [Concepts and Terms](#Concepts-and-Terms)
|
||||
* [Dataset](#Dataset)
|
||||
* [Resource](#Resource)
|
||||
* [View Spec](#view-spec)
|
||||
* [Docs](#Docs)
|
||||
* [Community](#Community)
|
||||
* [Appendix](#Appendix)
|
||||
* [What happened to Recline?](#What-happened-to-Recline?)
|
||||
|
||||
# What is Portal.JS
|
||||
🌀 `portal.js` is a framework for rapidly building rich data portal frontends using a modern frontend approach. `portal.js` can be used to present a single dataset or build a full-scale data catalog/portal.
|
||||
|
||||
`portal.js` is built in Javascript and React on top of the popular [Next.js](https://nextjs.com/) framework. `portal` assumes a "decoupled" approach where the frontend is a separate service from the backend and interacts with backend(s) via an API. It can be used with any backend and has out of the box support for [CKAN](https://ckan.org/).
|
||||
🌀 Portal.JS is a framework for rapidly building rich data portal frontends using a modern frontend approach. Portal.JS can be used to present a single dataset or build a full-scale data catalog/portal.
|
||||
|
||||
Built in JavaScript and React on top of the popular [Next.js](https://nextjs.com/) framework. Portal.JS assumes a "decoupled" approach where the frontend is a separate service from the backend and interacts with backend(s) via an API. It can be used with any backend and has out of the box support for [CKAN](https://ckan.org/).
|
||||
|
||||
## Features
|
||||
|
||||
- 🗺️ Unified sites: present data and content in one seamless site, pulling datasets from a DMS (e.g. CKAN) and content from a CMS (e.g. wordpress) with a common internal API.
|
||||
- 👩💻 Developer friendly: built with familiar frontend tech Javascript, React etc
|
||||
- 🔋 Batteries included: Full set of portal components out of the box e.g. catalog search, dataset showcase, blog etc.
|
||||
- 🗺️ Unified sites: present data and content in one seamless site, pulling datasets from a DMS (e.g. CKAN) and content from a CMS (e.g. Wordpress) with a common internal API.
|
||||
- 👩💻 Developer friendly: built with familiar frontend tech (JavaScript, React, Next.js).
|
||||
- 🔋 Batteries included: full set of portal components out of the box e.g. catalog search, dataset showcase, blog, etc.
|
||||
- 🎨 Easy to theme and customize: installable themes, use standard CSS and React+CSS tooling. Add new routes quickly.
|
||||
- 🧱 Extensible: quickly extend and develop/import your own React components
|
||||
- 📝 Well documented: full set of documentation plus the documentation of NextJS and Apollo.
|
||||
- 📝 Well documented: full set of documentation plus the documentation of Next.js and Apollo.
|
||||
|
||||
### For developers
|
||||
|
||||
- 🏗 Build with modern, familiar frontend tech such as Javascript and React.
|
||||
- 🚀 NextJS framework: so everything in NextJS for free React, SSR, static site generation, huge number of examples and integrations etc.
|
||||
- SSR => unlimited number of pages, SEO etc whilst still using React.
|
||||
- Static Site Generation (SSG) (good for small sites) => ultra-simple deployment, great performance and lighthouse scores etc
|
||||
- 🏗 Build with modern, familiar frontend tech such as JavaScript and React.
|
||||
- 🚀 Next.js framework: so everything in Next.js for free: Server Side Rendering, Static Site Generation, huge number of examples and integrations, etc.
|
||||
- Server Side Rendering (SSR) => Unlimited number of pages, SEO and more whilst still using React.
|
||||
- Static Site Generation (SSG) => Ultra-simple deployment, great performance, great lighthouse scores and more (good for small sites)
|
||||
|
||||
# Installation and setup
|
||||
Before installation, ensure your system satisfies the following requirements:
|
||||
|
||||
- Node.js 10.13 or later
|
||||
- Nextjs 10.0.3
|
||||
- MacOS, Windows (including WSL), and Linux are supported
|
||||
|
||||
> Note: We also recommend instead of npm using `yarn` instead of `npm`.
|
||||
>
|
||||
Portal.js is built with React on top of Nextjs framework, so for a quick setup, you can bootstrap a Nextjs app and install portal.js as demonstrated in the code below:
|
||||
|
||||
```bash=
|
||||
## Create a react app
|
||||
npx create-next-app
|
||||
# or
|
||||
yarn create next-app
|
||||
```
|
||||
After the installation is complete, follow the instructions to start the development server. Try editing pages/index.js and see the result on your browser.
|
||||
|
||||
> For more information on how to use create-next-app, you can review the [create-next-app](https://nextjs.org/docs/api-reference/create-next-app) documentation.
|
||||
|
||||
Once you have Nextjs created, you can install portal.js:
|
||||
|
||||
```bash=
|
||||
yarn add https://github.com/datopian/portal.js.git
|
||||
```
|
||||
|
||||
You're now ready to use portal.js in your next app. To test portal.js, open your `index.js` file in the pages folder. By default you should have some autogenerated code in the `index.js` file:
|
||||
|
||||
|
||||
Which outputs a page with the following content:
|
||||
|
||||

|
||||
|
||||
Now, we are going to do some clean up and add a table component. In the `index.js` file, import a [Table]() component from portal as shown below:
|
||||
|
||||
```javascript
|
||||
import Head from 'next/head'
|
||||
import { Table } from 'portal' //import Table component
|
||||
import styles from '../styles/Home.module.css'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const columns = [
|
||||
{ field: 'id', headerName: 'ID' },
|
||||
{ field: 'firstName', headerName: 'First name' },
|
||||
{ field: 'lastName', headerName: 'Last name' },
|
||||
{ field: 'age', headerName: 'Age' }
|
||||
];
|
||||
|
||||
const rows = [
|
||||
{ id: 1, lastName: 'Snow', firstName: 'Jon', age: 35 },
|
||||
{ id: 2, lastName: 'Lannister', firstName: 'Cersei', age: 42 },
|
||||
{ id: 3, lastName: 'Lannister', firstName: 'Jaime', age: 45 },
|
||||
{ id: 4, lastName: 'Stark', firstName: 'Arya', age: 16 },
|
||||
{ id: 7, lastName: 'Clifford', firstName: 'Ferrara', age: 44 },
|
||||
{ id: 8, lastName: 'Frances', firstName: 'Rossini', age: 36 },
|
||||
{ id: 9, lastName: 'Roxie', firstName: 'Harvey', age: 65 },
|
||||
];
|
||||
|
||||
return (
|
||||
<div className={styles.container}>
|
||||
<Head>
|
||||
<title>Create Portal App</title>
|
||||
<link rel="icon" href="/favicon.ico" />
|
||||
</Head>
|
||||
|
||||
<h1 className={styles.title}>
|
||||
Welcome to <a href="https://nextjs.org">Portal.JS</a>
|
||||
</h1>
|
||||
|
||||
{/* Use table component */}
|
||||
<Table data={rows} columns={columns} />
|
||||
|
||||
</div>
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
Now, your page should look like the following:
|
||||
|
||||

|
||||
|
||||
> **Note**: You can learn more about individual portal components, as well as their prop types in the [components reference](#Component-List).
|
||||
#### **Check out the [Portal.JS website](https://portaljs.org/) for a gallery of live portals**
|
||||
|
||||
___
|
||||
|
||||
# Getting Started
|
||||
# Docs
|
||||
|
||||
If you're new to Portal.js we recommend that you start with the step-by-step guide below. You can also check out the following examples of projects built with portal.js.
|
||||
Access the Portal.JS documentation at:
|
||||
|
||||
* [A portal for a single Frictionless dataset](#Build-a-single-Frictionless-dataset-portal)
|
||||
* [A portal with a CKAN backend](#Build-a-CKAN-powered-dataset-portal)
|
||||
https://portaljs.org/docs
|
||||
|
||||
> The [`examples` directory](https://github.com/datopian/portal.js/tree/main/examples) is regularly updated with different portal examples.
|
||||
- [Examples](https://portaljs.org/docs#examples)
|
||||
|
||||
If you have questions about anything related to Portal.js, you're always welcome to ask our community on [GitHub Discussions](https://github.com/datopian/portal.js/discussions).
|
||||
___
|
||||
# Community
|
||||
|
||||
# Tutorial
|
||||
|
||||
## Build a single Frictionless dataset portal
|
||||
This tutorial will guide you through building a portal for a single Frictionless dataset.
|
||||
|
||||
[Here’s](https://portal-js.vercel.app/) an example of the final result.
|
||||
|
||||
### Setup
|
||||
The dataset should be a Frictionless Dataset i.e. it should have a [datapackage.json](https://specs.frictionlessdata.io/data-package/).
|
||||
|
||||
Create a frictionless dataset portal app from the template:
|
||||
```
|
||||
npx create-next-app -e https://github.com/datopian/portal.js/tree/main/examples/dataset-frictionless
|
||||
#choose a name for your portal when prompted e.g. your-portal
|
||||
```
|
||||
Go into your portal's directory and set the path to your dataset directory that contains the `datapackage.json`:
|
||||
```
|
||||
cd <your-portal>
|
||||
export PORTAL_DATASET_PATH=<path/to/your/dataset>
|
||||
```
|
||||
Start the server:
|
||||
```
|
||||
yarn dev
|
||||
```
|
||||
Visit the Page to view your dataset portal.
|
||||
|
||||
## Build a CKAN powered dataset portal
|
||||
|
||||
See [the CKAN Portal.JS example](./examples/ckan).
|
||||
|
||||
___
|
||||
|
||||
|
||||
# Architecture / Reference
|
||||
|
||||
## Component List
|
||||
|
||||
Portal.js supports many components that can help you build amazing data portals similar to [this](https://catalog-portal-js.vercel.app/) and [this](https://portal-js.vercel.app/).
|
||||
|
||||
In this section, we'll cover all supported components in depth, and help you understand their use as well as the expected properties.
|
||||
|
||||
Components are grouped under the following sections:
|
||||
* [UI](https://github.com/datopian/portal.js/tree/main/src/components/ui): Components like Nav bar, Footer, e.t.c
|
||||
* [Dataset](https://github.com/datopian/portal.js/tree/main/src/components/dataset): Components used for displaying a Frictionless dataset and resources
|
||||
* [Search](https://github.com/datopian/portal.js/tree/main/src/components/search): Components used for building a search interface for datasets
|
||||
* [Blog](https://github.com/datopian/portal.js/tree/main/src/components/blog): Components for building a simple blog for datasets
|
||||
* [Views](https://github.com/datopian/portal.js/tree/main/src/components/views): Components like charts, tables, maps for generating data views
|
||||
* [Misc](https://github.com/datopian/portal.js/tree/main/src/components/misc): Miscellaneous components like errors, custom links, etc used for extra design.
|
||||
|
||||
### UI Components
|
||||
|
||||
In the UI we group all components that can be used for building generic page sections. These are components for building sections like the Navigation bar, Footer, Side pane, Recent datasets, e.t.c.
|
||||
|
||||
#### [Nav Component](https://github.com/datopian/portal.js/blob/main/src/components/ui/Nav.js)
|
||||
|
||||
To build a navigation bar, you can use the `Nav` component as demonstrated below:
|
||||
|
||||
```javascript
|
||||
import { Nav } from 'portal'
|
||||
|
||||
export default function Home(){
|
||||
|
||||
const navMenu = [{ title: 'Blog', path: '/blog' },
|
||||
{ title: 'Search', path: '/search' }]
|
||||
|
||||
return (
|
||||
<>
|
||||
<Nav logo="/images/logo.png" navMenu={navMenu}/>
|
||||
...
|
||||
</>
|
||||
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Nav Component Prop Types
|
||||
|
||||
Nav component accepts two properties:
|
||||
* **logo**: A string to an image path. Can be relative or absolute.
|
||||
* **navMenu**: An array of objects with title and path. E.g : [{ title: 'Blog', path: '/blog' },{ title: 'Search', path: '/search' }]
|
||||
|
||||
|
||||
#### [Recent Component](https://github.com/datopian/portal.js/blob/main/src/components/ui/Recent.js)
|
||||
|
||||
The `Recent` component is used to display a list of recent [datasets](#Dataset) in the home page. This useful if you want to display the most recent dataset users have interacted with in your home page.
|
||||
To build a recent dataset section, you can use the `Recent` component as demonstrated below:
|
||||
|
||||
```javascript
|
||||
import { Recent } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const datasets = [
|
||||
{
|
||||
organization: {
|
||||
name: "Org1",
|
||||
title: "This is the first org",
|
||||
description: "A description of the organization 1"
|
||||
},
|
||||
title: "Data package title",
|
||||
name: "dataset1",
|
||||
description: "description of data package",
|
||||
resources: [],
|
||||
},
|
||||
{
|
||||
organization: {
|
||||
name: "Org2",
|
||||
title: "This is the second org",
|
||||
description: "A description of the organization 2"
|
||||
},
|
||||
title: "Data package title",
|
||||
name: "dataset2",
|
||||
description: "description of data package",
|
||||
resources: [],
|
||||
},
|
||||
]
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Use Recent component */}
|
||||
<Recent datasets={datasets} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
Note: The `Recent` component is hyperlinked with the dataset name of the organization and the dataset name in the following format:
|
||||
|
||||
> `/@<org name>/<dataset name>`
|
||||
|
||||
For instance, using the example dataset above, the first component will be link to page:
|
||||
|
||||
> `/@org1/dataset1`
|
||||
|
||||
and the second will be linked to:
|
||||
|
||||
> `/@org2/dataset2`
|
||||
|
||||
This is useful to know when generating dynamic pages for each dataset.
|
||||
|
||||
#### Recent Component Prop Types
|
||||
|
||||
The `Recent` component accepts the following properties:
|
||||
* **datasets**: An array of [datasets](#Dataset)
|
||||
|
||||
### Dataset Components
|
||||
|
||||
The dataset component groups together components that can be used for building a dataset UI. These includes components for displaying info about a dataset, resources in a dataset as well as dataset ReadMe.
|
||||
|
||||
#### [KeyInfo Component](https://github.com/datopian/portal.js/blob/main/src/components/dataset/KeyInfo.js)
|
||||
|
||||
The `KeyInfo` components displays key properties like the number of resources, size, format, licences of in a dataset in tabular form. See example in the `Key Info` section [here](https://portal-js.vercel.app/). To use it, you can import the `KeyInfo` component as demonstrated below:
|
||||
```javascript
|
||||
import { KeyInfo } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const datapackage = {
|
||||
"name": "finance-vix",
|
||||
"title": "VIX - CBOE Volatility Index",
|
||||
"homepage": "http://www.cboe.com/micro/VIX/",
|
||||
"version": "0.1.0",
|
||||
"license": "PDDL-1.0",
|
||||
"sources": [
|
||||
{
|
||||
"title": "CBOE VIX Page",
|
||||
"name": "CBOE VIX Page",
|
||||
"web": "http://www.cboe.com/micro/vix/historical.aspx"
|
||||
}
|
||||
],
|
||||
"resources": [
|
||||
{
|
||||
"name": "vix-daily",
|
||||
"path": "vix-daily.csv",
|
||||
"format": "csv",
|
||||
"size": 20982,
|
||||
"mediatype": "text/csv",
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Use KeyInfo component */}
|
||||
<KeyInfo descriptor={datapackage} resources={datapackage.resources} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
#### KeyInfo Component Prop Types
|
||||
|
||||
KeyInfo component accepts two properties:
|
||||
* **descriptor**: A [Frictionless data package descriptor](https://specs.frictionlessdata.io/data-package/#descriptor)
|
||||
* **resources**: An [Frictionless data package resource](https://specs.frictionlessdata.io/data-resource/#introduction)
|
||||
|
||||
|
||||
#### [ResourceInfo Component](https://github.com/datopian/portal.js/blob/main/src/components/dataset/ResourceInfo.js)
|
||||
|
||||
The `ResourceInfo` components displays key properties like the name, size, format, modification dates, as well as a download link in a resource object. See an example of a `ResourceInfo` component in the `Data Files` section [here](https://portal-js.vercel.app/).
|
||||
|
||||
You can import and use the `ResourceInfo` component as demonstrated below:
|
||||
```javascript
|
||||
import { ResourceInfo } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const resources = [
|
||||
{
|
||||
"name": "vix-daily",
|
||||
"path": "vix-daily.csv",
|
||||
"format": "csv",
|
||||
"size": 20982,
|
||||
"mediatype": "text/csv",
|
||||
},
|
||||
{
|
||||
"name": "vix-daily 2",
|
||||
"path": "vix-daily2.csv",
|
||||
"format": "csv",
|
||||
"size": 2082,
|
||||
"mediatype": "text/csv",
|
||||
}
|
||||
]
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Use Recent component */}
|
||||
<ResourceInfo resources={resources} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### ResourceInfo Component Prop Types
|
||||
|
||||
ResourceInfo component accepts a single property:
|
||||
* **resources**: An [Frictionless data package resource](https://specs.frictionlessdata.io/data-resource/#introduction)
|
||||
|
||||
|
||||
#### [ReadMe Component](https://github.com/datopian/portal.js/blob/main/src/components/dataset/Readme.js)
|
||||
|
||||
The `ReadMe` component is used for displaying a compiled dataset Readme in a readable format. See example in the `README` section [here](https://portal-js.vercel.app/).
|
||||
|
||||
> Note: By compiled ReadMe, we mean ReadMe that has been converted to plain string using a package like [remark](https://www.npmjs.com/package/remark).
|
||||
|
||||
You can import and use the `ReadMe` component as demonstrated below:
|
||||
```javascript
|
||||
import { ReadMe } from 'portal'
|
||||
import remark from 'remark'
|
||||
import html from 'remark-html'
|
||||
import { useEffect, useState } from 'react'
|
||||
|
||||
|
||||
const readMeMarkdown = `
|
||||
CBOE Volatility Index (VIX) time-series dataset including daily open, close,
|
||||
high and low. The CBOE Volatility Index (VIX) is a key measure of market
|
||||
expectations of near-term volatility conveyed by S&P 500 stock index option
|
||||
prices introduced in 1993.
|
||||
|
||||
## Data
|
||||
|
||||
From the [VIX FAQ][faq]:
|
||||
|
||||
> In 1993, the Chicago Board Options Exchange® (CBOE®) introduced the CBOE
|
||||
> Volatility Index®, VIX®, and it quickly became the benchmark for stock market
|
||||
> volatility. It is widely followed and has been cited in hundreds of news
|
||||
> articles in the Wall Street Journal, Barron's and other leading financial
|
||||
> publications. Since volatility often signifies financial turmoil, VIX is
|
||||
> often referred to as the "investor fear gauge".
|
||||
|
||||
[faq]: http://www.cboe.com/micro/vix/faq.aspx
|
||||
|
||||
## License
|
||||
|
||||
No obvious statement on [historical data page][historical]. Given size and
|
||||
factual nature of the data and its source from a US company would imagine this
|
||||
was public domain and as such have licensed the Data Package under the Public
|
||||
Domain Dedication and License (PDDL).
|
||||
|
||||
[historical]: http://www.cboe.com/micro/vix/historical.aspx
|
||||
`
|
||||
|
||||
export default function Home() {
|
||||
const [readMe, setreadMe] = useState("")
|
||||
|
||||
useEffect(() => {
|
||||
async function processReadMe() {
|
||||
const processed = await remark()
|
||||
.use(html)
|
||||
.process(readMeMarkdown)
|
||||
setreadMe(processed.toString())
|
||||
}
|
||||
processReadMe()
|
||||
}, [])
|
||||
|
||||
return (
|
||||
<div>
|
||||
<ReadMe readme={readMe} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
#### ReadMe Component Prop Types
|
||||
|
||||
The `ReadMe` component accepts a single property:
|
||||
* **readme**: A string of a compiled ReadMe in html format.
|
||||
|
||||
### [View Components](https://github.com/datopian/portal.js/tree/main/src/components/views)
|
||||
|
||||
View components is a set of components that can be used for displaying dataset views like charts, tables, maps, e.t.c.
|
||||
|
||||
#### [Chart Component](https://github.com/datopian/portal.js/blob/main/src/components/views/Chart.js)
|
||||
|
||||
The `Chart` components exposes different chart components like Plotly Chart, Vega charts, which can be used for showing graphs. See example in the `Graph` section [here](https://portal-js.vercel.app/).
|
||||
To use a chart component, you need to compile and pass a view spec as props to the chart component.
|
||||
Each Chart type have their specific spec, as explained in this [doc](https://specs.frictionlessdata.io/views/#graph-spec).
|
||||
|
||||
In the example below, we assume there's a compiled Plotly spec:
|
||||
|
||||
```javascript
|
||||
import { PlotlyChart } from 'portal'
|
||||
|
||||
export default function Home({plotlySpec}) {
|
||||
|
||||
return (
|
||||
< div >
|
||||
<PlotlyChart spec={plotlySpec} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
> Note: You can compile views using the [datapackage-render](https://github.com/datopian/datapackage-views-js) library, as demonstrated in [this example](https://github.com/datopian/portal.js/blob/main/examples/dataset-frictionless/lib/utils.js).
|
||||
|
||||
|
||||
#### Chart Component Prop Types
|
||||
|
||||
KeyInfo component accepts two properties:
|
||||
* **spec**: A compiled view spec depending on the chart type.
|
||||
|
||||
#### [Table Component](https://github.com/datopian/portal.js/blob/main/examples/dataset-frictionless/components/Table.js)
|
||||
|
||||
The `Table` component is used for displaying dataset resources as a tabular grid. See example in the `Data Preview` section [here](https://portal-js.vercel.app/).
|
||||
To use a Table component, you have to pass an array of data and columns as demonstrated below:
|
||||
|
||||
```javascript
|
||||
import { Table } from 'portal' //import Table component
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const columns = [
|
||||
{ field: 'id', headerName: 'ID' },
|
||||
{ field: 'firstName', headerName: 'First name' },
|
||||
{ field: 'lastName', headerName: 'Last name' },
|
||||
{ field: 'age', headerName: 'Age' }
|
||||
];
|
||||
|
||||
const data = [
|
||||
{ id: 1, lastName: 'Snow', firstName: 'Jon', age: 35 },
|
||||
{ id: 2, lastName: 'Lannister', firstName: 'Cersei', age: 42 },
|
||||
{ id: 3, lastName: 'Lannister', firstName: 'Jaime', age: 45 },
|
||||
{ id: 4, lastName: 'Stark', firstName: 'Arya', age: 16 },
|
||||
{ id: 7, lastName: 'Clifford', firstName: 'Ferrara', age: 44 },
|
||||
{ id: 8, lastName: 'Frances', firstName: 'Rossini', age: 36 },
|
||||
{ id: 9, lastName: 'Roxie', firstName: 'Harvey', age: 65 },
|
||||
];
|
||||
|
||||
return (
|
||||
<Table data={data} columns={columns} />
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
> Note: Under the hood, Table component uses the [DataGrid Material UI table](https://material-ui.com/components/data-grid/), and as such all supported params in data and columns are supported.
|
||||
|
||||
|
||||
#### Table Component Prop Types
|
||||
|
||||
Table component accepts two properties:
|
||||
* **data**: An array of column names with properties: e.g [{field: "col1", headerName: "col1"}, {field: "col2", headerName: "col2"}]
|
||||
* **columns**: An array of data objects e.g. [ {col1: 1, col2: 2}, {col1: 5, col2: 7} ]
|
||||
|
||||
|
||||
### [Search Components](https://github.com/datopian/portal.js/tree/main/src/components/search)
|
||||
|
||||
Search components groups together components that can be used for creating a search interface. This includes search forms, search item as well as search result list.
|
||||
|
||||
#### [Form Component](https://github.com/datopian/portal.js/blob/main/src/components/search/Form.js)
|
||||
|
||||
The search`Form` component is a simple search input and submit button. See example of a search form [here](https://catalog-portal-js.vercel.app/search).
|
||||
|
||||
The search `form` requires a submit handler (`handleSubmit`). This handler function receives the search term, and handles actual search.
|
||||
|
||||
In the example below, we demonstrate how to use the `Form` component.
|
||||
|
||||
```javascript
|
||||
import { Form } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const handleSearchSubmit = (searchQuery) => {
|
||||
// Write your custom code to perform search in db
|
||||
console.log(searchQuery);
|
||||
}
|
||||
|
||||
return (
|
||||
<Form
|
||||
handleSubmit={handleSearchSubmit} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Form Component Prop Types
|
||||
|
||||
The `Form` component accepts a single property:
|
||||
* **handleSubmit**: A function that receives the search text, and can be customize to perform the actual search.
|
||||
|
||||
#### [Item Component](https://github.com/datopian/portal.js/blob/main/src/components/search/Item.js)
|
||||
|
||||
The search`Item` component can be used to display a single search result.
|
||||
|
||||
In the example below, we demonstrate how to use the `Item` component.
|
||||
|
||||
```javascript
|
||||
import { Item } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
const datapackage = {
|
||||
"name": "finance-vix",
|
||||
"title": "VIX - CBOE Volatility Index",
|
||||
"homepage": "http://www.cboe.com/micro/VIX/",
|
||||
"version": "0.1.0",
|
||||
"description": "This is a test organization description",
|
||||
"resources": [
|
||||
{
|
||||
"name": "vix-daily",
|
||||
"path": "vix-daily.csv",
|
||||
"format": "csv",
|
||||
"size": 20982,
|
||||
"mediatype": "text/csv",
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
return (
|
||||
<Item dataset={datapackage} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Item Component Prop Types
|
||||
|
||||
The `Item` component accepts a single property:
|
||||
* **dataset**: A [Frictionless data package descriptor](https://specs.frictionlessdata.io/data-package/#descriptor)
|
||||
|
||||
|
||||
#### [ItemTotal Component](https://github.com/datopian/portal.js/blob/main/src/components/search/Item.js)
|
||||
|
||||
The search`ItemTotal` is a simple component for displaying the total search result
|
||||
|
||||
In the example below, we demonstrate how to use the `ItemTotal` component.
|
||||
|
||||
```javascript
|
||||
import { ItemTotal } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
//do some custom search to get results
|
||||
const search = (text) => {
|
||||
return [{ name: "data1" }, { name: "data2" }]
|
||||
}
|
||||
//get the total result count
|
||||
const searchTotal = search("some text").length
|
||||
|
||||
return (
|
||||
<ItemTotal count={searchTotal} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### ItemTotal Component Prop Types
|
||||
|
||||
The `ItemTotal` component accepts a single property:
|
||||
* **count**: An integer of the total number of results.
|
||||
|
||||
|
||||
### [Blog Components](https://github.com/datopian/portal.js/tree/main/src/components/blog)
|
||||
|
||||
These are group of components for building a portal blog. See example of portal blog [here](https://catalog-portal-js.vercel.app/blog)
|
||||
|
||||
#### [PostList Components](https://github.com/datopian/portal.js/tree/main/src/components/misc)
|
||||
|
||||
The `PostList` component is used to display a list of blog posts with the title and a short excerpts from the content.
|
||||
|
||||
In the example below, we demonstrate how to use the `PostList` component.
|
||||
|
||||
```javascript
|
||||
import { PostList } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const posts = [
|
||||
{ title: "Blog post 1", excerpt: "This is the first blog excerpts in this list." },
|
||||
{ title: "Blog post 2", excerpt: "This is the second blog excerpts in this list." },
|
||||
{ title: "Blog post 3", excerpt: "This is the third blog excerpts in this list." },
|
||||
]
|
||||
return (
|
||||
<PostList posts={posts} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### PostList Component Prop Types
|
||||
|
||||
The `PostList` component accepts a single property:
|
||||
* **posts**: An array of post list objects with the following properties:
|
||||
```javascript
|
||||
[
|
||||
{
|
||||
title: "The title of the blog post",
|
||||
excerpt: "A short excerpt from the post content",
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
#### [Post Components](https://github.com/datopian/portal.js/tree/main/src/components/misc)
|
||||
|
||||
The `Post` component is used to display a blog post. See an example of a blog post [here](https://catalog-portal-js.vercel.app/blog/nyt-pa-platformen-opdateringsfrekvens-og-andres-data)
|
||||
|
||||
In the example below, we demonstrate how to use the `Post` component.
|
||||
|
||||
```javascript
|
||||
import { Post } from 'portal'
|
||||
import * as dayjs from 'dayjs' //For converting UTC time to relative format
|
||||
import relativeTime from 'dayjs/plugin/relativeTime'
|
||||
|
||||
dayjs.extend(relativeTime)
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const post = {
|
||||
title: "This is a sample blog post",
|
||||
content: `<h1>A simple header</h1>
|
||||
The PostList component is used to display a list of blog posts
|
||||
with the title and a short excerpts from the content.
|
||||
In the example below, we demonstrate how to use the PostList component.`,
|
||||
createdAt: dayjs().to(dayjs(1620649596902)),
|
||||
featuredImage: "https://pixabay.com/get/ge9a766d1f7b5fe0eccbf0f439501a2cf2b191997290e7ab15e6a402574acc2fdba48a82d278dca3547030e0202b7906d_640.jpg"
|
||||
}
|
||||
|
||||
return (
|
||||
<Post post={post} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Post Component Prop Types
|
||||
|
||||
The `Post` component accepts a single property:
|
||||
* **post**: An object with the following properties:
|
||||
```javascript
|
||||
{
|
||||
title: <The title of the blog post>
|
||||
content: <The body of the blog post. Can be plain text or html>
|
||||
createdAt: <The utc date when the post was last modified>
|
||||
featuredImage: < Url/relative url to post cover image>
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### [Misc Components](https://github.com/datopian/portal.js/tree/main/src/components/misc)
|
||||
|
||||
These are group of miscellaneous/extra components for extending your portal. They include components like Errors, custom links, etc.
|
||||
|
||||
#### [Error Component](https://github.com/datopian/portal.js/blob/main/src/components/misc/Error.js)
|
||||
|
||||
The `Error` component is used to display a custom error message.
|
||||
|
||||
In the example below, we demonstrate how to use the `Error` component.
|
||||
|
||||
```javascript
|
||||
import { Error } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
return (
|
||||
<Error message="An error occured when loading the file!" />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Error Component Prop Types
|
||||
|
||||
The `Error` component accepts a single property:
|
||||
* **message**: A string with the error message to display.
|
||||
|
||||
|
||||
#### [Custom Component](https://github.com/datopian/portal.js/blob/main/src/components/misc/Error.js)
|
||||
|
||||
The `CustomLink` component is used to create a link with a consistent style to other portal components.
|
||||
|
||||
In the example below, we demonstrate how to use the `CustomLink` component.
|
||||
|
||||
```javascript
|
||||
import { CustomLink } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
return (
|
||||
<CustomLink url="/blog" title="Goto Blog" />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### CustomLink Component Prop Types
|
||||
|
||||
The `CustomLink` component accepts the following properties:
|
||||
|
||||
* **url**: A string. The relative or absolute url of the link.
|
||||
* **title**: A string. The title of the link
|
||||
|
||||
|
||||
___
|
||||
|
||||
## Concepts and Terms
|
||||
In this section, we explain some of the terms and concepts used throughtout the portal.js documentation.
|
||||
> Some of these concepts are part of official specs, and when appropriate, we'll link to the sources where you can get more details.
|
||||
### Dataset
|
||||
A dataset extends the [Frictionless data package](https://specs.frictionlessdata.io/data-package/#metadata) to add an extra organization property. The organization property describes the organization the dataset belongs to, and it should have the following properties:
|
||||
```javascript
|
||||
organization = {
|
||||
name: "some org name",
|
||||
title: "Some optional org title",
|
||||
description: "A description of the organization"
|
||||
}
|
||||
```
|
||||
An example of dataset with organization properties is given below:
|
||||
```javascript
|
||||
datasets = [{
|
||||
organization: {
|
||||
name: "some org name",
|
||||
title: "Some optional org title",
|
||||
description: "A description of the organization"
|
||||
},
|
||||
title: "Data package title",
|
||||
name: "Data package name",
|
||||
description: "description of data package",
|
||||
resources: [...],
|
||||
licences: [...],
|
||||
sources: [...]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Resource
|
||||
TODO
|
||||
|
||||
### view spec
|
||||
|
||||
---
|
||||
|
||||
## Deploying portal build to github pages
|
||||
|
||||
[Deploying single frictionless dataset to Github](https://portaljs.org/publish)
|
||||
|
||||
## Showcases
|
||||
|
||||
### Single Dataset with Default Theme
|
||||
|
||||

|
||||
|
||||
---
|
||||
If you have questions about anything related to Portal.JS, you're always welcome to ask our community on [GitHub Discussions](https://github.com/datopian/portal.js/discussions) or on our [Discord server](https://discord.gg/An7Bu5x8).
|
||||
|
||||
# Appendix
|
||||
|
||||
|
||||
@@ -1,262 +1,8 @@
|
||||
<h1 align="center">
|
||||
# 🌀 PortalJS example with CKAN and Apollo
|
||||
|
||||
🌀 Portal.JS<br/>
|
||||
The javascript framework for<br/>
|
||||
data portals
|
||||
**🚩 UPDATE April 2023: This example is now deprecated - though still works!. Please use the [new CKAN examples](https://github.com/datopian/portaljs/tree/main/examples)**
|
||||
|
||||
</h1>
|
||||
|
||||
🌀 `Portal` is a framework for rapidly building rich data portal frontends using a modern frontend approach (javascript, React, SSR).
|
||||
|
||||
`Portal` assumes a "decoupled" approach where the frontend is a separate service from the backend and interacts with backend(s) via an API. It can be used with any backend and has out of the box support for [CKAN][]. `portal` is built in Javascript and React on top of the popular [Next.js][] framework.
|
||||
|
||||
[ckan]: https://ckan.org/
|
||||
[next.js]: https://nextjs.com/
|
||||
|
||||
Live DEMO: https://catalog-portal-js.vercel.app
|
||||
|
||||
## Features
|
||||
|
||||
- 🗺️ Unified sites: present data and content in one seamless site, pulling datasets from a DMS (e.g. CKAN) and content from a CMS (e.g. wordpress) with a common internal API.
|
||||
- 👩💻 Developer friendly: built with familiar frontend tech Javascript, React etc
|
||||
- 🔋 Batteries included: Full set of portal components out of the box e.g. catalog search, dataset showcase, blog etc.
|
||||
- 🎨 Easy to theme and customize: installable themes, use standard CSS and React+CSS tooling. Add new routes quickly.
|
||||
- 🧱 Extensible: quickly extend and develop/import your own React components
|
||||
- 📝 Well documented: full set of documentation plus the documentation of NextJS and Apollo.
|
||||
|
||||
### For developers
|
||||
|
||||
- 🏗 Build with modern, familiar frontend tech such as Javascript and React.
|
||||
- 🚀 NextJS framework: so everything in NextJS for free React, SSR, static site generation, huge number of examples and integrations etc.
|
||||
- SSR => unlimited number of pages, SEO etc whilst still using React.
|
||||
- Static Site Generation (SSG) (good for small sites) => ultra-simple deployment, great performance and lighthouse scores etc
|
||||
- 📋 Typescript support
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Setup
|
||||
|
||||
Install a recent version of Node. You'll need Node 10.13 or later.
|
||||
|
||||
### Create a Portal app
|
||||
|
||||
To create a Portal app, open your terminal, cd into the directory you'd like to create the app in, and run the following command:
|
||||
|
||||
```console
|
||||
npm init portal-app my-data-portal
|
||||
```
|
||||
|
||||
> NB: Under the hood, this uses the tool called create-next-app, which bootstraps a Next.js app for you. It uses this template through the --example flag.
|
||||
>
|
||||
> If it doesn’t work, please open an issue.
|
||||
|
||||
## Guide
|
||||
|
||||
### Styling 🎨
|
||||
|
||||
We use Tailwind as a CSS framework. Take a look at `/styles/index.css` to see what we're importing from Tailwind bundle. You can also configure Tailwind using `tailwind.config.js` file.
|
||||
|
||||
Have a look at Next.js support of CSS and ways of writing CSS:
|
||||
|
||||
https://nextjs.org/docs/basic-features/built-in-css-support
|
||||
|
||||
### Backend
|
||||
|
||||
So far the app is running with mocked data behind. You can connect CMS and DMS backends easily via environment variables:
|
||||
|
||||
```console
|
||||
$ export DMS=http://ckan:5000
|
||||
$ export CMS=http://myblog.wordpress.com
|
||||
```
|
||||
|
||||
> Note that we don't yet have implementations for the following CKAN features:
|
||||
>
|
||||
> - Activities
|
||||
> - Auth
|
||||
> - Groups
|
||||
> - Facets
|
||||
|
||||
### Routes
|
||||
|
||||
These are the default routes set up in the "starter" app.
|
||||
|
||||
- Home `/`
|
||||
- Search `/search`
|
||||
- Dataset `/@org/dataset`
|
||||
- Resource `/@org/dataset/r/resource`
|
||||
- Organization `/@org`
|
||||
- Collection (aka group in CKAN) (?) - suggest to merge into org
|
||||
- Static pages, eg, `/about` etc. from CMS or can do it without external CMS, e.g., in Next.js
|
||||
|
||||
### New Routes
|
||||
|
||||
TODO
|
||||
|
||||
### Data fetching
|
||||
|
||||
We use Apollo client which allows us to query data with GraphQL. We have setup CKAN API for the demo (it uses demo.ckan.org as DMS):
|
||||
|
||||
http://portal.datopian1.now.sh/
|
||||
|
||||
Note that we don't have Apollo Server but we connect CKAN API using [`apollo-link-rest`](https://www.apollographql.com/docs/link/links/rest/) module. You can see how it works in [lib/apolloClient.ts](https://github.com/datopian/portal/blob/master/lib/apolloClient.ts) and then have a look at [pages/\_app.tsx](https://github.com/datopian/portal/blob/master/pages/_app.tsx).
|
||||
|
||||
For development/debugging purposes, we suggest installing the Chrome extension - https://chrome.google.com/webstore/detail/apollo-client-developer-t/jdkknkkbebbapilgoeccciglkfbmbnfm.
|
||||
|
||||
#### i18n configuration
|
||||
|
||||
Portal.js is configured by default to support both `English` and `French` subpath for language translation. But for subsequent users, this following steps can be used to configure i18n for other languages;
|
||||
|
||||
1. Update `next.config.js`, to add more languages to the i18n locales
|
||||
|
||||
```js
|
||||
i18n: {
|
||||
locales: ['en', 'fr', 'nl-NL'], // add more language to the list
|
||||
defaultLocale: 'en', // set the default language to use
|
||||
},
|
||||
```
|
||||
|
||||
2. Create a folder for the language in `locales` --> `locales/en-Us`
|
||||
|
||||
3. In the language folder, different namespace files (json) can be created for each translation. For the `index.js` use-case, I named it `common.json`
|
||||
|
||||
```json
|
||||
// locales/en/common.json
|
||||
{
|
||||
"title" : "Portal js in English",
|
||||
}
|
||||
|
||||
// locales/fr/common.json
|
||||
{
|
||||
"title" : "Portal js in French",
|
||||
}
|
||||
```
|
||||
|
||||
4. To use on pages using Server-side Props.
|
||||
|
||||
```js
|
||||
import { loadNamespaces } from './_app';
|
||||
import useTranslation from 'next-translate/useTranslation';
|
||||
|
||||
const Home: React.FC = ()=> {
|
||||
const { t } = useTranslation();
|
||||
return (
|
||||
<div>{t(`common:title`)}</div> // we use common and title base on the common.json data
|
||||
);
|
||||
};
|
||||
|
||||
export const getServerSideProps: GetServerSideProps = async ({ locale }) => {
|
||||
........ ........
|
||||
return {
|
||||
props : {
|
||||
_ns: await loadNamespaces(['common'], locale),
|
||||
}
|
||||
};
|
||||
};
|
||||
|
||||
```
|
||||
|
||||
5. Go to the browser and view the changes using language subpath like this `http://localhost:3000` and `http://localhost:3000/fr`. **Note** The subpath also activate chrome language Translator
|
||||
|
||||
#### Pre-fetch data in the server-side
|
||||
|
||||
When visiting a dataset page, you may want to fetch the dataset metadata in the server-side. To do so, you can use `getServerSideProps` function from NextJS:
|
||||
|
||||
```javascript
|
||||
import { GetServerSideProps } from 'next';
|
||||
import { initializeApollo } from '../lib/apolloClient';
|
||||
import gql from 'graphql-tag';
|
||||
|
||||
const QUERY = gql`
|
||||
query dataset($id: String) {
|
||||
dataset(id: $id) @rest(type: "Response", path: "package_show?{args}") {
|
||||
result
|
||||
}
|
||||
}
|
||||
`;
|
||||
|
||||
...
|
||||
|
||||
export const getServerSideProps: GetServerSideProps = async (context) => {
|
||||
const apolloClient = initializeApollo();
|
||||
|
||||
await apolloClient.query({
|
||||
query: QUERY,
|
||||
variables: {
|
||||
id: 'my-dataset'
|
||||
},
|
||||
});
|
||||
|
||||
return {
|
||||
props: {
|
||||
initialApolloState: apolloClient.cache.extract(),
|
||||
},
|
||||
};
|
||||
};
|
||||
```
|
||||
|
||||
This would fetch the data from DMS and save it in the Apollo cache so that we can query it again from the components.
|
||||
|
||||
#### Access data from a component
|
||||
|
||||
Consider situation when rendering a component for org info on the dataset page. We already have pre-fetched dataset metadata that includes `organization` property with attributes such as `name`, `title` etc. We can now query only organization part for our `Org` component:
|
||||
|
||||
```javascript
|
||||
import { useQuery } from '@apollo/react-hooks';
|
||||
import gql from 'graphql-tag';
|
||||
|
||||
export const GET_ORG_QUERY = gql`
|
||||
query dataset($id: String) {
|
||||
dataset(id: $id) @rest(type: "Response", path: "package_show?{args}") {
|
||||
result {
|
||||
organization {
|
||||
name
|
||||
title
|
||||
image_url
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
`;
|
||||
|
||||
export default function Org({ variables }) {
|
||||
const { loading, error, data } = useQuery(
|
||||
GET_ORG_QUERY,
|
||||
{
|
||||
variables: { id: 'my-dataset' }
|
||||
}
|
||||
);
|
||||
|
||||
...
|
||||
|
||||
const { organization } = data.dataset.result;
|
||||
|
||||
return (
|
||||
<>
|
||||
{organization ? (
|
||||
<>
|
||||
<img
|
||||
src={
|
||||
organization.image_url
|
||||
}
|
||||
className="h-5 w-5 mr-2 inline-block"
|
||||
/>
|
||||
<Link href={`/@${organization.name}`}>
|
||||
<a className="font-semibold text-primary underline">
|
||||
{organization.title || organization.name}
|
||||
</a>
|
||||
</Link>
|
||||
</>
|
||||
) : (
|
||||
''
|
||||
)}
|
||||
</>
|
||||
);
|
||||
}
|
||||
```
|
||||
|
||||
#### Add a new data source
|
||||
|
||||
TODO
|
||||
This example shows how you can build a full data portal using a CKAN Backend with a Next.JS Frontend powered by Apollo, a full fledged guide is available as a [blog post](https://portaljs.org/blog/example-ckan-2021)
|
||||
|
||||
## Developers
|
||||
|
||||
@@ -303,4 +49,5 @@ yarn run e2e
|
||||
|
||||
### Key Pages
|
||||
|
||||
See https://tech.datopian.com/frontend/
|
||||
See https://datahub.io/docs/dms/frontend/
|
||||
|
||||
|
||||
@@ -1,6 +1,5 @@
|
||||
{
|
||||
"extends": [
|
||||
"plugin:@nrwl/nx/react-typescript",
|
||||
"next",
|
||||
"next/core-web-vitals",
|
||||
"../../.eslintrc.json"
|
||||
|
||||
@@ -1,17 +1,46 @@
|
||||
This is a repo intended to serve as an example of a data catalog that get its data from a CKAN Instance.
|
||||
|
||||
- Creating a new file inside o `examples` with `create-next-app` like so:
|
||||
```
|
||||
npx create-next-app <app-name> --example https://github.com/datopian/portaljs/tree/main/ --example-path examples/ckan-example
|
||||
npx create-next-app <app-name> --example https://github.com/datopian/portaljs/tree/main/examples/ckan-example
|
||||
cd <app-name>
|
||||
```
|
||||
- Inside `<app-name>` go to the `project.json` file and replace all instances of `ckan-example` with `<app-name>`
|
||||
- Set the `DMS` env variable to the Url of the CKAN Instance Ex: `export DMS=https://demo.dev.datopian.com`
|
||||
|
||||
- This project uses CKAN as a backend, so you need to point the project to the CKAN Url desired, you can do so by setting up the `DMS` env variable in your terminal or adding a `.env` file with the following content:
|
||||
|
||||
```
|
||||
DMS=<ckan url>
|
||||
```
|
||||
|
||||
- Run the app using:
|
||||
|
||||
```
|
||||
nx serve <app-name>
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Congratulations, you now have something similar to this running on `http://localhost:4200`
|
||||

|
||||
If yo go to any one of those pages by clicking on `More info` you will see something similar to this
|
||||

|
||||
|
||||
## Deployment
|
||||
|
||||
[](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fdatopian%2Fportaljs%2Ftree%2Fmain%2Fexamples%2Fckan-example&env=DMS&envDescription=URL%20For%20the%20CKAN%20Backend%20Ex%3A%20https%3A%2F%2Fdemo.dev.datopian.com)
|
||||
|
||||
By clicking on this button, you will be redirected to a page which will allow you to clone the content into your own github/gitlab/bitbucket account and automatically deploy everything.
|
||||
|
||||
|
||||
|
||||
## Extra commands
|
||||
|
||||
You can also build the project for production with
|
||||
|
||||
```
|
||||
npm run build
|
||||
```
|
||||
|
||||
And run using the production build like so:
|
||||
|
||||
```
|
||||
npm run start
|
||||
```
|
||||
|
||||
|
||||
@@ -1,9 +1,3 @@
|
||||
// eslint-disable-next-line @typescript-eslint/no-var-requires
|
||||
const { withNx } = require('@nrwl/next/plugins/with-nx');
|
||||
|
||||
/**
|
||||
* @type {import('@nrwl/next/plugins/with-nx').WithNxOptions}
|
||||
**/
|
||||
const nextConfig = {
|
||||
publicRuntimeConfig: {
|
||||
DMS: process.env.DMS ? process.env.DMS : '',
|
||||
@@ -18,11 +12,6 @@ const nextConfig = {
|
||||
],
|
||||
};
|
||||
},
|
||||
nx: {
|
||||
// Set this to true if you would like to use SVGR
|
||||
// See: https://github.com/gregberge/svgr
|
||||
svgr: false,
|
||||
},
|
||||
};
|
||||
|
||||
module.exports = withNx(nextConfig);
|
||||
module.exports = nextConfig;
|
||||
|
||||
5842
examples/ckan-example/package-lock.json
generated
Normal file
5842
examples/ckan-example/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
33
examples/ckan-example/package.json
Normal file
33
examples/ckan-example/package.json
Normal file
@@ -0,0 +1,33 @@
|
||||
{
|
||||
"name": "my-app",
|
||||
"version": "0.1.0",
|
||||
"private": true,
|
||||
"scripts": {
|
||||
"dev": "next dev",
|
||||
"build": "next build",
|
||||
"start": "next start",
|
||||
"lint": "next lint"
|
||||
},
|
||||
"dependencies": {
|
||||
"@heroicons/react": "^2.0.17",
|
||||
"@types/node": "18.16.0",
|
||||
"@types/react": "18.0.38",
|
||||
"@types/react-dom": "18.0.11",
|
||||
"eslint": "8.39.0",
|
||||
"eslint-config-next": "13.3.1",
|
||||
"next": "13.3.1",
|
||||
"next-seo": "^6.0.0",
|
||||
"octokit": "^2.0.14",
|
||||
"react": "18.2.0",
|
||||
"react-dom": "18.2.0",
|
||||
"react-markdown": "^8.0.7",
|
||||
"remark-gfm": "^3.0.1",
|
||||
"typescript": "5.0.4"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@tailwindcss/typography": "^0.5.9",
|
||||
"autoprefixer": "^10.4.14",
|
||||
"postcss": "^8.4.23",
|
||||
"tailwindcss": "^3.3.1"
|
||||
}
|
||||
}
|
||||
@@ -1,15 +1,6 @@
|
||||
const { join } = require('path');
|
||||
|
||||
// Note: If you use library-specific PostCSS/Tailwind configuration then you should remove the `postcssConfig` build
|
||||
// option from your application's configuration (i.e. project.json).
|
||||
//
|
||||
// See: https://nx.dev/guides/using-tailwind-css-in-react#step-4:-applying-configuration-to-libraries
|
||||
|
||||
module.exports = {
|
||||
plugins: {
|
||||
tailwindcss: {
|
||||
config: join(__dirname, 'tailwind.config.js'),
|
||||
},
|
||||
tailwindcss: {},
|
||||
autoprefixer: {},
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
@@ -1,17 +1,15 @@
|
||||
const { createGlobPatternsForDependencies } = require('@nrwl/react/tailwind');
|
||||
const { join } = require('path');
|
||||
|
||||
/** @type {import('tailwindcss').Config} */
|
||||
module.exports = {
|
||||
content: [
|
||||
join(
|
||||
__dirname,
|
||||
'{src,pages,components}/**/*!(*.stories|*.spec).{ts,tsx,html}'
|
||||
),
|
||||
...createGlobPatternsForDependencies(__dirname),
|
||||
"./app/**/*.{js,ts,jsx,tsx,mdx}",
|
||||
"./pages/**/*.{js,ts,jsx,tsx,mdx}",
|
||||
"./components/**/*.{js,ts,jsx,tsx,mdx}",
|
||||
],
|
||||
theme: {
|
||||
extend: {},
|
||||
},
|
||||
plugins: [],
|
||||
};
|
||||
plugins: [
|
||||
require('@tailwindcss/typography')
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
@@ -1,23 +1,20 @@
|
||||
{
|
||||
"extends": "../../tsconfig.base.json",
|
||||
"compilerOptions": {
|
||||
"jsx": "preserve",
|
||||
"target": "es5",
|
||||
"lib": ["dom", "dom.iterable", "esnext"],
|
||||
"allowJs": true,
|
||||
"esModuleInterop": true,
|
||||
"allowSyntheticDefaultImports": true,
|
||||
"skipLibCheck": true,
|
||||
"strict": false,
|
||||
"forceConsistentCasingInFileNames": true,
|
||||
"noEmit": true,
|
||||
"esModuleInterop": true,
|
||||
"module": "esnext",
|
||||
"moduleResolution": "node",
|
||||
"resolveJsonModule": true,
|
||||
"isolatedModules": true,
|
||||
"incremental": true,
|
||||
"types": ["jest", "node"]
|
||||
"jsx": "preserve",
|
||||
"incremental": true
|
||||
},
|
||||
"include": ["**/*.ts", "**/*.tsx", "**/*.js", "**/*.jsx", "next-env.d.ts"],
|
||||
"exclude": [
|
||||
"node_modules",
|
||||
"jest.config.ts",
|
||||
"src/**/*.spec.ts",
|
||||
"src/**/*.test.ts"
|
||||
]
|
||||
"include": ["next-env.d.ts", "**/*.ts", "**/*.tsx"],
|
||||
"exclude": ["node_modules"]
|
||||
}
|
||||
|
||||
@@ -1,9 +1,7 @@
|
||||
{
|
||||
"extends": [
|
||||
"plugin:@nrwl/nx/react-typescript",
|
||||
"next",
|
||||
"next/core-web-vitals",
|
||||
"../../.eslintrc.json"
|
||||
"next/core-web-vitals"
|
||||
],
|
||||
"ignorePatterns": ["!**/*", ".next/**/*"],
|
||||
"overrides": [
|
||||
|
||||
@@ -1,17 +1,75 @@
|
||||
This is a repo intended to serve as a simple example of a data catalog that get its data from a series of github repos, you can init an example just like this one by.
|
||||
|
||||
- Creating a new file inside o `examples` with `create-next-app` like so:
|
||||
- Creating a new project with `create-next-app` like so:
|
||||
|
||||
```
|
||||
npx create-next-app <app-name> --example https://github.com/datopian/portaljs/tree/main/ --example-path examples/simple-example
|
||||
npx create-next-app <app-name> --example https://github.com/datopian/portaljs/tree/main/examples/simple-example
|
||||
cd <app-name>
|
||||
```
|
||||
- Inside `<app-name>` go to the `project.json` file and replace all instances of `simple-example` with `<app-name>`
|
||||
|
||||
- This project uses the github api, which for anonymous users will cap at 50 requests per hour, so you might want to get a [Personal Access Token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token) and add it to a `.env` file inside the folder like so
|
||||
|
||||
```
|
||||
GITHUB_PAT=<github token>
|
||||
```
|
||||
|
||||
- Edit the file `datasets.json` to your liking, some examples can be found inside this [repo](https://github.com/datasets)
|
||||
- Run the app using:
|
||||
```
|
||||
nx serve <app-name>
|
||||
```
|
||||
Congratulations, you now have something similar to this running on `http://localhost:4200`
|
||||

|
||||
If yo go to any one of those pages by clicking on `More info` you will see something similar to this
|
||||

|
||||
|
||||
```
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Congratulations, you now have something similar to this running on `http://localhost:3000`
|
||||

|
||||
If yo go to any one of those pages by clicking on `More info` you will see something similar to this
|
||||

|
||||
|
||||
## Deployment
|
||||
|
||||
[](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fdatopian%2Fportaljs%2Ftree%2Fmain%2Fexamples%2Fsimple-example)
|
||||
|
||||
By clicking on this button, you will be redirected to a page which will allow you to clone the content into your own github/gitlab/bitbucket account and automatically deploy everything.
|
||||
|
||||
|
||||
## Structure of `datasets.json`
|
||||
|
||||
The `datasets.json` file is simply a list of datasets, below you can see a minimal example of a dataset
|
||||
|
||||
```json
|
||||
{
|
||||
"owner": "fivethirtyeight",
|
||||
"repo": "data",
|
||||
"branch": "master",
|
||||
"files": ["nba-raptor/historical_RAPTOR_by_player.csv", "nba-raptor/historical_RAPTOR_by_team.csv"],
|
||||
"readme": "nba-raptor/README.md"
|
||||
}
|
||||
```
|
||||
|
||||
It has
|
||||
|
||||
- A `owner` which is going to be the github repo owner
|
||||
- A `repo` which is going to be the github repo name
|
||||
- A `branch` which is going to be the branch to which we need to get the files and the readme
|
||||
- A list of `files` which is going to be a list of paths with files that you want to show to the world
|
||||
- A `readme` which is going to be the path to your data description, it can also be a subpath eg: `example/README.md`
|
||||
|
||||
You can also add
|
||||
|
||||
- A `description` which is useful if you have more than one dataset for each repo, if not provided we are just going to use the repo description
|
||||
- A `Name` which is useful if you want to give your dataset a nice name, if not provided we are going to use the junction of the `owner` the `repo` + the path of the README, in the exaple above it will be `fivethirtyeight/data/nba-raptor`
|
||||
|
||||
## Extra commands
|
||||
|
||||
You can also build the project for production with
|
||||
|
||||
```
|
||||
npm run build
|
||||
```
|
||||
|
||||
And run using the production build like so:
|
||||
|
||||
```
|
||||
npm run start
|
||||
```
|
||||
|
||||
|
||||
16
examples/simple-example/components/Footer.tsx
Normal file
16
examples/simple-example/components/Footer.tsx
Normal file
@@ -0,0 +1,16 @@
|
||||
export default function Footer() {
|
||||
return (
|
||||
<footer className="bg-white">
|
||||
<div className="mx-auto max-w-7xl overflow-hidden px-6 py-8 sm:py-12 lg:px-8">
|
||||
<div className="mt-10 flex justify-center space-x-10">
|
||||
<span className="text-gray-400 hover:text-gray-500 flex gap-4 items-center">
|
||||
<span className="mt-2">Powered by</span>
|
||||
<a href="https://datopian.com">
|
||||
<img src="/logo.png" className="w-32 h-10" />
|
||||
</a>
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
);
|
||||
}
|
||||
@@ -1,85 +0,0 @@
|
||||
import FrictionlessViewFactory from "./drd/FrictionlessView";
|
||||
import Table from "./drd/Table";
|
||||
|
||||
/* eslint import/no-default-export: off */
|
||||
function DatapackageLayout({ children, project, excerpt }) {
|
||||
const { metadata } = project;
|
||||
|
||||
const title = metadata.title;
|
||||
const resources = metadata.resources;
|
||||
const views = metadata.views;
|
||||
|
||||
const FrictionlessView = FrictionlessViewFactory({ views, resources });
|
||||
|
||||
return (
|
||||
<article className="docs prose text-primary dark:text-primary-dark dark:prose-invert prose-headings:font-headings prose-a:break-words mx-auto p-6">
|
||||
<header>
|
||||
{title && <h1 className="mb-4">{title}</h1>}
|
||||
<a
|
||||
className="font-semibold mb-4"
|
||||
target="_blank"
|
||||
href={project.github_repo}
|
||||
>
|
||||
@{project.owner} / {project.name}
|
||||
</a>
|
||||
{excerpt && <p className="text-md">{excerpt}</p>}
|
||||
</header>
|
||||
<section className="mt-10">
|
||||
{views.map((view, i) => {
|
||||
return (
|
||||
<div key={`visualization-${i}`}>
|
||||
<FrictionlessView viewId={i} />
|
||||
</div>
|
||||
);
|
||||
})}
|
||||
</section>
|
||||
<section className="mt-10">
|
||||
<h2>Data files</h2>
|
||||
<table className="table-auto">
|
||||
<thead>
|
||||
<tr>
|
||||
<th>File</th>
|
||||
<th>Title</th>
|
||||
<th>Format</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody>
|
||||
{resources.map((r) => {
|
||||
return (
|
||||
<tr key={`resources-list-${r.name}`}>
|
||||
<td>
|
||||
<a
|
||||
target="_blank"
|
||||
href={`https://github.com/${project.owner}/${project.name}/blob/main/${r.path}`}
|
||||
>
|
||||
{r.path}
|
||||
</a>
|
||||
</td>
|
||||
<td>{r.title}</td>
|
||||
<td>{r.format.toUpperCase()}</td>
|
||||
</tr>
|
||||
);
|
||||
})}
|
||||
</tbody>
|
||||
</table>
|
||||
{resources.slice(0, 5).map((resource) => {
|
||||
return (
|
||||
<div key={`resource-preview-${resource.name}`} className="mt-10">
|
||||
<h3>{resource.title || resource.name || resource.path}</h3>
|
||||
<Table url={resource.path} />
|
||||
</div>
|
||||
);
|
||||
})}
|
||||
</section>
|
||||
<hr />
|
||||
<section>
|
||||
<h2>Read me</h2>
|
||||
{children}
|
||||
</section>
|
||||
</article>
|
||||
);
|
||||
}
|
||||
|
||||
export default function MDLayout({ children, layout, ...props }) {
|
||||
return <DatapackageLayout project={props.project} excerpt={props.excerpt}>{children}</DatapackageLayout>;
|
||||
}
|
||||
31
examples/simple-example/components/NavBar.tsx
Normal file
31
examples/simple-example/components/NavBar.tsx
Normal file
@@ -0,0 +1,31 @@
|
||||
export default function NavBar() {
|
||||
return (
|
||||
<div className="mx-auto max-w-7xl px-4 sm:px-6 lg:px-8">
|
||||
<div className="flex h-16 justify-between">
|
||||
<div className="flex">
|
||||
<div className="flex flex-shrink-0 items-center">
|
||||
<img
|
||||
className="block h-8 w-auto lg:hidden"
|
||||
src="/logo.png"
|
||||
alt="Your Company"
|
||||
/>
|
||||
<img
|
||||
className="hidden h-8 w-auto lg:block mt-4"
|
||||
src="/logo.png"
|
||||
alt="Your Company"
|
||||
/>
|
||||
</div>
|
||||
<div className="hidden sm:ml-6 sm:flex sm:space-x-8">
|
||||
{/* Current: "border-indigo-500 text-gray-900", Default: "border-transparent text-gray-500 hover:border-gray-300 hover:text-gray-700" */}
|
||||
<a
|
||||
href="/"
|
||||
className="inline-flex items-center border-b-2 border-indigo-500 px-1 pt-1 text-sm font-medium text-gray-900"
|
||||
>
|
||||
Home
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -1,40 +0,0 @@
|
||||
import { MDXRemote } from "next-mdx-remote";
|
||||
import dynamic from "next/dynamic";
|
||||
import { Mermaid } from "@flowershow/core";
|
||||
|
||||
import FrictionlessViewFactory from "./FrictionlessView";
|
||||
|
||||
// Custom components/renderers to pass to MDX.
|
||||
// Since the MDX files aren't loaded by webpack, they have no knowledge of how
|
||||
// to handle import statements. Instead, you must include components in scope
|
||||
// here.
|
||||
const components = {
|
||||
Table: dynamic(() => import("./Table")),
|
||||
mermaid: Mermaid,
|
||||
// Excel: dynamic(() => import('../components/Excel')),
|
||||
// TODO: try and make these dynamic ...
|
||||
Vega: dynamic(() => import("./Vega")),
|
||||
VegaLite: dynamic(() => import("./VegaLite")),
|
||||
LineChart: dynamic(() => import("./LineChart")),
|
||||
} as any;
|
||||
|
||||
export default function DRD({
|
||||
source,
|
||||
frictionless = {
|
||||
views: [],
|
||||
resources: [],
|
||||
},
|
||||
}: {
|
||||
source: any;
|
||||
frictionless?: any;
|
||||
}) {
|
||||
// dynamic() can't be used inside of React rendering
|
||||
// as it needs to be marked in the top level of the
|
||||
// module for preloading to work
|
||||
components.FrictionlessView = FrictionlessViewFactory({
|
||||
views: frictionless.views,
|
||||
resources: frictionless.resources,
|
||||
});
|
||||
|
||||
return <MDXRemote {...source} components={components} />;
|
||||
}
|
||||
@@ -1,33 +0,0 @@
|
||||
import { useEffect, useState } from "react";
|
||||
|
||||
const DebouncedInput = ({
|
||||
value: initialValue,
|
||||
onChange,
|
||||
debounce = 500,
|
||||
...props
|
||||
}) => {
|
||||
const [value, setValue] = useState(initialValue);
|
||||
|
||||
useEffect(() => {
|
||||
setValue(initialValue);
|
||||
}, [initialValue]);
|
||||
|
||||
useEffect(() => {
|
||||
const timeout = setTimeout(() => {
|
||||
onChange(value);
|
||||
}, debounce);
|
||||
|
||||
return () => clearTimeout(timeout);
|
||||
}, [value]);
|
||||
|
||||
return (
|
||||
<input
|
||||
{...props}
|
||||
value={value}
|
||||
onChange={(e) => setValue(e.target.value)}
|
||||
/>
|
||||
);
|
||||
};
|
||||
|
||||
export default DebouncedInput;
|
||||
|
||||
@@ -1,55 +0,0 @@
|
||||
// FrictionlessView is a factory because we have to
|
||||
// set the views and resources lists before using it
|
||||
|
||||
import { convertSimpleToVegaLite } from "../../lib/viewSpecConversion";
|
||||
import VegaLite from "./VegaLite";
|
||||
|
||||
export default function FrictionlessViewFactory({
|
||||
views = [],
|
||||
resources = [],
|
||||
}): ({
|
||||
viewId,
|
||||
fullWidth,
|
||||
}: {
|
||||
viewId: number;
|
||||
fullWidth?: boolean;
|
||||
}) => JSX.Element {
|
||||
return ({ viewId, fullWidth = false }) => {
|
||||
if (!(viewId in views)) {
|
||||
console.error(`View ${viewId} not found`);
|
||||
return <></>;
|
||||
}
|
||||
const view = views[viewId];
|
||||
|
||||
let resource;
|
||||
if (resources.length > 1) {
|
||||
resource = resources.find((r) => r.name === view.resourceName);
|
||||
} else {
|
||||
resource = resources[0];
|
||||
}
|
||||
|
||||
if (!resource) {
|
||||
console.error(`Resource not found for view id ${viewId}`);
|
||||
return <></>;
|
||||
}
|
||||
|
||||
let vegaSpec;
|
||||
switch (view.specType) {
|
||||
case "simple":
|
||||
vegaSpec = convertSimpleToVegaLite(view, resource);
|
||||
break;
|
||||
// ... other conversions
|
||||
}
|
||||
|
||||
vegaSpec.data = { url: resource.path };
|
||||
|
||||
return (
|
||||
<VegaLite
|
||||
fullWidth={fullWidth}
|
||||
spec={vegaSpec}
|
||||
actions={{ editor: false }}
|
||||
downloadFileName={resource.name}
|
||||
/>
|
||||
);
|
||||
};
|
||||
}
|
||||
@@ -1,49 +0,0 @@
|
||||
import { VegaLite } from "react-vega";
|
||||
|
||||
export default function LineChart({
|
||||
data = [],
|
||||
fullWidth = false,
|
||||
title = "",
|
||||
}) {
|
||||
var tmp = data;
|
||||
if (Array.isArray(data)) {
|
||||
tmp = data.map((r, i) => {
|
||||
return { x: r[0], y: r[1] };
|
||||
});
|
||||
}
|
||||
const vegaData = { table: tmp };
|
||||
const spec = {
|
||||
$schema: "https://vega.github.io/schema/vega-lite/v5.json",
|
||||
title,
|
||||
width: "container" as "container",
|
||||
height: 300,
|
||||
mark: {
|
||||
type: "line" as "line",
|
||||
color: "black",
|
||||
strokeWidth: 1,
|
||||
tooltip: true,
|
||||
},
|
||||
data: {
|
||||
name: "table",
|
||||
},
|
||||
selection: {
|
||||
grid: {
|
||||
type: "interval" as "interval",
|
||||
bind: "scales",
|
||||
},
|
||||
},
|
||||
encoding: {
|
||||
x: {
|
||||
field: "x",
|
||||
timeUnit: "year",
|
||||
type: "temporal" as "temporal",
|
||||
},
|
||||
y: {
|
||||
field: "y",
|
||||
type: "quantitative" as "temporal",
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
return <VegaLite data={vegaData} spec={spec} />;
|
||||
}
|
||||
@@ -1,188 +0,0 @@
|
||||
import {
|
||||
createColumnHelper,
|
||||
FilterFn,
|
||||
flexRender,
|
||||
getCoreRowModel,
|
||||
getFilteredRowModel,
|
||||
getPaginationRowModel,
|
||||
getSortedRowModel,
|
||||
useReactTable,
|
||||
} from "@tanstack/react-table";
|
||||
|
||||
import {
|
||||
ArrowDownIcon,
|
||||
ArrowUpIcon,
|
||||
ChevronDoubleLeftIcon,
|
||||
ChevronDoubleRightIcon,
|
||||
ChevronLeftIcon,
|
||||
ChevronRightIcon,
|
||||
} from "@heroicons/react/24/solid";
|
||||
|
||||
import React, { useEffect, useMemo, useState } from "react";
|
||||
|
||||
import loadUrlProxied from "../../lib/loadUrlProxied";
|
||||
import parseCsv from "../../lib/parseCsv";
|
||||
import DebouncedInput from "./DebouncedInput";
|
||||
|
||||
const Table = ({
|
||||
data: ogData = [],
|
||||
cols: ogCols = [],
|
||||
csv = "",
|
||||
url = "",
|
||||
}) => {
|
||||
if (csv) {
|
||||
const out = parseCsv(csv);
|
||||
ogData = out.rows;
|
||||
ogCols = out.fields;
|
||||
}
|
||||
|
||||
const [data, setData] = React.useState(ogData);
|
||||
const [cols, setCols] = React.useState(ogCols);
|
||||
const [error, setError] = React.useState(""); // TODO: add error handling
|
||||
|
||||
const tableCols = useMemo(() => {
|
||||
const columnHelper = createColumnHelper();
|
||||
return cols.map((c) =>
|
||||
columnHelper.accessor(c.key, {
|
||||
header: () => c.name,
|
||||
cell: (info) => info.getValue(),
|
||||
})
|
||||
);
|
||||
}, [data, cols]);
|
||||
|
||||
const [globalFilter, setGlobalFilter] = useState("");
|
||||
|
||||
const table = useReactTable({
|
||||
data,
|
||||
columns: tableCols,
|
||||
getCoreRowModel: getCoreRowModel(),
|
||||
state: {
|
||||
globalFilter,
|
||||
},
|
||||
globalFilterFn: globalFilterFn,
|
||||
onGlobalFilterChange: setGlobalFilter,
|
||||
getFilteredRowModel: getFilteredRowModel(),
|
||||
getPaginationRowModel: getPaginationRowModel(),
|
||||
getSortedRowModel: getSortedRowModel(),
|
||||
});
|
||||
|
||||
useEffect(() => {
|
||||
if (url) {
|
||||
loadUrlProxied(url).then((data) => {
|
||||
const { rows, fields } = parseCsv(data);
|
||||
setData(rows);
|
||||
setCols(fields);
|
||||
});
|
||||
}
|
||||
}, [url]);
|
||||
|
||||
return (
|
||||
<div>
|
||||
<DebouncedInput
|
||||
value={globalFilter ?? ""}
|
||||
onChange={(value) => setGlobalFilter(String(value))}
|
||||
className="p-2 text-sm shadow border border-block"
|
||||
placeholder="Search all columns..."
|
||||
/>
|
||||
<table>
|
||||
<thead>
|
||||
{table.getHeaderGroups().map((hg) => (
|
||||
<tr key={hg.id}>
|
||||
{hg.headers.map((h) => (
|
||||
<th key={h.id}>
|
||||
<div
|
||||
{...{
|
||||
className: h.column.getCanSort()
|
||||
? "cursor-pointer select-none"
|
||||
: "",
|
||||
onClick: h.column.getToggleSortingHandler(),
|
||||
}}
|
||||
>
|
||||
{flexRender(h.column.columnDef.header, h.getContext())}
|
||||
{{
|
||||
asc: (
|
||||
<ArrowUpIcon className="inline-block ml-2 h-4 w-4" />
|
||||
),
|
||||
desc: (
|
||||
<ArrowDownIcon className="inline-block ml-2 h-4 w-4" />
|
||||
),
|
||||
}[h.column.getIsSorted() as string] ?? (
|
||||
<div className="inline-block ml-2 h-4 w-4" />
|
||||
)}
|
||||
</div>
|
||||
</th>
|
||||
))}
|
||||
</tr>
|
||||
))}
|
||||
</thead>
|
||||
<tbody>
|
||||
{table.getRowModel().rows.map((r) => (
|
||||
<tr key={r.id}>
|
||||
{r.getVisibleCells().map((c) => (
|
||||
<td key={c.id}>
|
||||
{flexRender(c.column.columnDef.cell, c.getContext())}
|
||||
</td>
|
||||
))}
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
<div className="flex gap-2 items-center justify-center">
|
||||
<button
|
||||
className={`w-6 h-6 ${
|
||||
!table.getCanPreviousPage() ? "opacity-25" : "opacity-100"
|
||||
}`}
|
||||
onClick={() => table.setPageIndex(0)}
|
||||
disabled={!table.getCanPreviousPage()}
|
||||
>
|
||||
<ChevronDoubleLeftIcon />
|
||||
</button>
|
||||
<button
|
||||
className={`w-6 h-6 ${
|
||||
!table.getCanPreviousPage() ? "opacity-25" : "opacity-100"
|
||||
}`}
|
||||
onClick={() => table.previousPage()}
|
||||
disabled={!table.getCanPreviousPage()}
|
||||
>
|
||||
<ChevronLeftIcon />
|
||||
</button>
|
||||
<span className="flex items-center gap-1">
|
||||
<div>Page</div>
|
||||
<strong>
|
||||
{table.getState().pagination.pageIndex + 1} of{" "}
|
||||
{table.getPageCount()}
|
||||
</strong>
|
||||
</span>
|
||||
<button
|
||||
className={`w-6 h-6 ${
|
||||
!table.getCanNextPage() ? "opacity-25" : "opacity-100"
|
||||
}`}
|
||||
onClick={() => table.nextPage()}
|
||||
disabled={!table.getCanNextPage()}
|
||||
>
|
||||
<ChevronRightIcon />
|
||||
</button>
|
||||
<button
|
||||
className={`w-6 h-6 ${
|
||||
!table.getCanNextPage() ? "opacity-25" : "opacity-100"
|
||||
}`}
|
||||
onClick={() => table.setPageIndex(table.getPageCount() - 1)}
|
||||
disabled={!table.getCanNextPage()}
|
||||
>
|
||||
<ChevronDoubleRightIcon />
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
const globalFilterFn: FilterFn<any> = (row, columnId, filterValue: string) => {
|
||||
const search = filterValue.toLowerCase();
|
||||
|
||||
let value = row.getValue(columnId) as string;
|
||||
if (typeof value === "number") value = String(value);
|
||||
|
||||
return value?.toLowerCase().includes(search);
|
||||
};
|
||||
|
||||
export default Table;
|
||||
@@ -1,4 +0,0 @@
|
||||
import { Vega as VegaOg } from "react-vega";
|
||||
export default function Vega(props) {
|
||||
return <VegaOg className="w-full" {...props} />;
|
||||
}
|
||||
@@ -1,4 +0,0 @@
|
||||
import { VegaLite as VegaOg } from "react-vega";
|
||||
export default function Vega(props) {
|
||||
return <VegaOg className="w-full" {...props} />;
|
||||
}
|
||||
@@ -1,7 +1,44 @@
|
||||
[
|
||||
{ "owner": "datasets", "repo": "oil-prices"},
|
||||
{ "owner": "datasets", "repo": "investor-flow-of-funds-us"},
|
||||
{ "owner": "datasets", "repo": "browser-stats"},
|
||||
{ "owner": "datasets", "repo": "glacier-mass-balance"},
|
||||
{ "owner": "datasets", "repo": "bond-yields-us-10y"}
|
||||
{
|
||||
"owner": "datasets",
|
||||
"branch": "main",
|
||||
"repo": "oil-prices",
|
||||
"files": [
|
||||
"data/brent-daily.csv",
|
||||
"data/brent-monthly.csv",
|
||||
"data/brent-weekly.csv",
|
||||
"data/brent-year.csv",
|
||||
"data/wti-daily.csv",
|
||||
"data/wti-monthly.csv",
|
||||
"data/wti-weekly.csv",
|
||||
"data/wti-year.csv"
|
||||
],
|
||||
"readme": "README.md"
|
||||
},
|
||||
{
|
||||
"owner": "datasets",
|
||||
"branch": "main",
|
||||
"repo": "investor-flow-of-funds-us",
|
||||
"files": [
|
||||
"data/monthly.csv",
|
||||
"data/weekly.csv"
|
||||
],
|
||||
"readme": "README.md"
|
||||
},
|
||||
{
|
||||
"owner": "fivethirtyeight",
|
||||
"repo": "data",
|
||||
"branch": "master",
|
||||
"description": "Data about bad drivers",
|
||||
"name": "Bad Drivers",
|
||||
"files": ["bad-drivers/bad-drivers.csv"],
|
||||
"readme": "bad-drivers/README.md"
|
||||
},
|
||||
{
|
||||
"owner": "fivethirtyeight",
|
||||
"repo": "data",
|
||||
"branch": "master",
|
||||
"files": ["nba-raptor/historical_RAPTOR_by_player.csv", "nba-raptor/historical_RAPTOR_by_team.csv"],
|
||||
"readme": "nba-raptor/README.md"
|
||||
}
|
||||
]
|
||||
|
||||
@@ -1,11 +0,0 @@
|
||||
/* eslint-disable */
|
||||
export default {
|
||||
displayName: 'simple-example',
|
||||
preset: '../../jest.preset.js',
|
||||
transform: {
|
||||
'^(?!.*\\.(js|jsx|ts|tsx|css|json)$)': '@nrwl/react/plugins/jest',
|
||||
'^.+\\.[tj]sx?$': ['babel-jest', { presets: ['@nrwl/next/babel'] }],
|
||||
},
|
||||
moduleFileExtensions: ['ts', 'tsx', 'js', 'jsx'],
|
||||
coverageDirectory: '../../coverage/examples/simple-example',
|
||||
};
|
||||
@@ -1,11 +0,0 @@
|
||||
import axios from "axios";
|
||||
|
||||
export default function loadUrlProxied(url: string) {
|
||||
// HACK: duplicate of Excel code - maybe refactor
|
||||
// if url is external may have CORS issue so we proxy it ...
|
||||
if (url.startsWith("http")) {
|
||||
const PROXY_URL = "/api/proxy";
|
||||
url = PROXY_URL + "?url=" + encodeURIComponent(url);
|
||||
}
|
||||
return axios.get(url).then((res) => res.data);
|
||||
}
|
||||
@@ -1,105 +0,0 @@
|
||||
import matter from "gray-matter";
|
||||
import mdxmermaid from "mdx-mermaid";
|
||||
import { h } from "hastscript";
|
||||
import remarkCallouts from "@flowershow/remark-callouts";
|
||||
import remarkEmbed from "@flowershow/remark-embed";
|
||||
import remarkGfm from "remark-gfm";
|
||||
import remarkMath from "remark-math";
|
||||
import remarkSmartypants from "remark-smartypants";
|
||||
import remarkToc from "remark-toc";
|
||||
import remarkWikiLink from "@flowershow/remark-wiki-link";
|
||||
import rehypeAutolinkHeadings from "rehype-autolink-headings";
|
||||
import rehypeKatex from "rehype-katex";
|
||||
import rehypeSlug from "rehype-slug";
|
||||
import rehypePrismPlus from "rehype-prism-plus";
|
||||
|
||||
import { serialize } from "next-mdx-remote/serialize";
|
||||
|
||||
/**
|
||||
* Parse a markdown or MDX file to an MDX source form + front matter data
|
||||
*
|
||||
* @source: the contents of a markdown or mdx file
|
||||
* @format: used to indicate to next-mdx-remote which format to use (md or mdx)
|
||||
* @returns: { mdxSource: mdxSource, frontMatter: ...}
|
||||
*/
|
||||
const parse = async function (source, format) {
|
||||
const { content, data, excerpt } = matter(source, {
|
||||
excerpt: (file, options) => {
|
||||
// Generate an excerpt for the file
|
||||
file.excerpt = file.content.split("\n\n")[0];
|
||||
},
|
||||
});
|
||||
|
||||
const mdxSource = await serialize(
|
||||
{ value: content, path: format },
|
||||
{
|
||||
// Optionally pass remark/rehype plugins
|
||||
mdxOptions: {
|
||||
remarkPlugins: [
|
||||
remarkEmbed,
|
||||
remarkGfm,
|
||||
[remarkSmartypants, { quotes: false, dashes: "oldschool" }],
|
||||
remarkMath,
|
||||
remarkCallouts,
|
||||
remarkWikiLink,
|
||||
[
|
||||
remarkToc,
|
||||
{
|
||||
heading: "Table of contents",
|
||||
tight: true,
|
||||
},
|
||||
],
|
||||
[mdxmermaid, {}],
|
||||
],
|
||||
rehypePlugins: [
|
||||
rehypeSlug,
|
||||
[
|
||||
rehypeAutolinkHeadings,
|
||||
{
|
||||
properties: { className: 'heading-link' },
|
||||
test(element) {
|
||||
return (
|
||||
["h2", "h3", "h4", "h5", "h6"].includes(element.tagName) &&
|
||||
element.properties?.id !== "table-of-contents" &&
|
||||
element.properties?.className !== "blockquote-heading"
|
||||
);
|
||||
},
|
||||
content() {
|
||||
return [
|
||||
h(
|
||||
"svg",
|
||||
{
|
||||
xmlns: "http:www.w3.org/2000/svg",
|
||||
fill: "#ab2b65",
|
||||
viewBox: "0 0 20 20",
|
||||
className: "w-5 h-5",
|
||||
},
|
||||
[
|
||||
h("path", {
|
||||
fillRule: "evenodd",
|
||||
clipRule: "evenodd",
|
||||
d: "M9.493 2.853a.75.75 0 00-1.486-.205L7.545 6H4.198a.75.75 0 000 1.5h3.14l-.69 5H3.302a.75.75 0 000 1.5h3.14l-.435 3.148a.75.75 0 001.486.205L7.955 14h2.986l-.434 3.148a.75.75 0 001.486.205L12.456 14h3.346a.75.75 0 000-1.5h-3.14l.69-5h3.346a.75.75 0 000-1.5h-3.14l.435-3.147a.75.75 0 00-1.486-.205L12.045 6H9.059l.434-3.147zM8.852 7.5l-.69 5h2.986l.69-5H8.852z",
|
||||
}),
|
||||
]
|
||||
),
|
||||
];
|
||||
},
|
||||
},
|
||||
],
|
||||
[rehypeKatex, { output: "mathml" }],
|
||||
[rehypePrismPlus, { ignoreMissing: true }],
|
||||
],
|
||||
format,
|
||||
},
|
||||
scope: data,
|
||||
}
|
||||
);
|
||||
|
||||
return {
|
||||
mdxSource: mdxSource,
|
||||
frontMatter: data,
|
||||
excerpt,
|
||||
};
|
||||
};
|
||||
|
||||
export default parse;
|
||||
147
examples/simple-example/lib/octokit.ts
Normal file
147
examples/simple-example/lib/octokit.ts
Normal file
@@ -0,0 +1,147 @@
|
||||
import { Octokit } from 'octokit';
|
||||
|
||||
export interface GithubProject {
|
||||
owner: string;
|
||||
repo: string;
|
||||
branch: string;
|
||||
files: string[];
|
||||
readme: string;
|
||||
description?: string;
|
||||
name?: string;
|
||||
}
|
||||
|
||||
export async function getProjectReadme(
|
||||
owner: string,
|
||||
repo: string,
|
||||
branch: string,
|
||||
readme: string,
|
||||
github_pat?: string
|
||||
) {
|
||||
const octokit = new Octokit({ auth: github_pat });
|
||||
try {
|
||||
const response = await octokit.rest.repos.getContent({
|
||||
owner,
|
||||
repo,
|
||||
path: readme,
|
||||
ref: branch,
|
||||
});
|
||||
const data = response.data as { content?: string };
|
||||
const fileContent = data.content ? data.content : '';
|
||||
if (fileContent === '') {
|
||||
return null;
|
||||
}
|
||||
const decodedContent = Buffer.from(fileContent, 'base64').toString();
|
||||
return decodedContent;
|
||||
} catch (error) {
|
||||
console.log(error);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export async function getLastUpdated(
|
||||
owner: string,
|
||||
repo: string,
|
||||
branch: string,
|
||||
readme: string,
|
||||
github_pat?: string
|
||||
) {
|
||||
const octokit = new Octokit({ auth: github_pat });
|
||||
try {
|
||||
const response = await octokit.rest.repos.listCommits({
|
||||
owner,
|
||||
repo,
|
||||
path: readme,
|
||||
ref: branch,
|
||||
});
|
||||
return response.data[0].commit.committer.date;
|
||||
} catch (error) {
|
||||
console.log(error);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
export async function getProjectMetadata(
|
||||
owner: string,
|
||||
repo: string,
|
||||
github_pat?: string
|
||||
) {
|
||||
const octokit = new Octokit({ auth: github_pat });
|
||||
try {
|
||||
const response = await octokit.rest.repos.get({
|
||||
owner,
|
||||
repo,
|
||||
});
|
||||
return response.data;
|
||||
} catch (error) {
|
||||
console.log(error);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export async function getRepoContents(
|
||||
owner: string,
|
||||
repo: string,
|
||||
branch: string,
|
||||
files: string[],
|
||||
github_pat?: string
|
||||
) {
|
||||
const octokit = new Octokit({ auth: github_pat });
|
||||
try {
|
||||
const contents = [];
|
||||
for (const path of files) {
|
||||
const response = await octokit.rest.repos.getContent({
|
||||
owner,
|
||||
repo,
|
||||
ref: branch,
|
||||
path: path,
|
||||
});
|
||||
const data = response.data as { download_url?: string, name: string, size: number };
|
||||
contents.push({ download_url: data.download_url, name: data.name, size: data.size});
|
||||
}
|
||||
return contents;
|
||||
} catch (error) {
|
||||
console.log(error);
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
export async function getProject(project: GithubProject, github_pat?: string) {
|
||||
const projectMetadata = await getProjectMetadata(
|
||||
project.owner,
|
||||
project.repo,
|
||||
github_pat
|
||||
);
|
||||
if (!projectMetadata) {
|
||||
return null;
|
||||
}
|
||||
const projectReadme = await getProjectReadme(
|
||||
project.owner,
|
||||
project.repo,
|
||||
project.branch,
|
||||
project.readme,
|
||||
github_pat
|
||||
);
|
||||
if (!projectReadme) {
|
||||
return null;
|
||||
}
|
||||
const projectData = await getRepoContents(
|
||||
project.owner,
|
||||
project.repo,
|
||||
project.branch,
|
||||
project.files,
|
||||
github_pat
|
||||
);
|
||||
if (!projectData) {
|
||||
return null;
|
||||
}
|
||||
const projectBase = project.readme.split('/').length > 1
|
||||
? project.readme.split('/').slice(0, -1).join('/')
|
||||
: '/'
|
||||
const last_updated = await getLastUpdated(
|
||||
project.owner,
|
||||
project.repo,
|
||||
project.branch,
|
||||
projectBase,
|
||||
github_pat
|
||||
);
|
||||
return { ...projectMetadata, files: projectData, readmeContent: projectReadme, last_updated, base_path: projectBase };
|
||||
}
|
||||
@@ -1,16 +0,0 @@
|
||||
import papa from "papaparse";
|
||||
|
||||
const parseCsv = (csv) => {
|
||||
csv = csv.trim();
|
||||
const rawdata = papa.parse(csv, { header: true });
|
||||
const cols = rawdata.meta.fields.map((r, i) => {
|
||||
return { key: r, name: r };
|
||||
});
|
||||
|
||||
return {
|
||||
rows: rawdata.data,
|
||||
fields: cols,
|
||||
};
|
||||
};
|
||||
|
||||
export default parseCsv;
|
||||
@@ -1,60 +0,0 @@
|
||||
import * as crypto from "crypto";
|
||||
import axios from "axios";
|
||||
import { Octokit } from "octokit"
|
||||
|
||||
export default class Project {
|
||||
id: string;
|
||||
name: string;
|
||||
owner: string;
|
||||
github_repo: string;
|
||||
readme: string;
|
||||
metadata: any;
|
||||
repo_metadata: any;
|
||||
|
||||
constructor(owner: string, name: string) {
|
||||
this.name = name;
|
||||
this.owner = owner;
|
||||
this.github_repo = `https://github.com/${owner}/${name}`;
|
||||
|
||||
// TODO: using the GitHub repo to set the id is not a good idea
|
||||
// since repos can be renamed and then we are going to end up with
|
||||
// a duplicate
|
||||
const encodedGHRepo = Buffer.from(this.github_repo, "utf-8").toString();
|
||||
this.id = crypto.createHash("sha1").update(encodedGHRepo).digest("hex");
|
||||
}
|
||||
|
||||
initFromGitHub = async () => {
|
||||
const octokit = new Octokit()
|
||||
// TODO: what if the repo doesn't exist?
|
||||
await this.getFileContent("README.md")
|
||||
.then((content) => (this.readme = content))
|
||||
.catch((e) => (this.readme = null));
|
||||
|
||||
await this.getFileContent("datapackage.json")
|
||||
.then((content) => (this.metadata = content))
|
||||
.catch((e) => (this.metadata = {}));
|
||||
|
||||
const github_metadata = await octokit.rest.repos.get({ owner: this.owner, repo: this.name })
|
||||
this.repo_metadata = github_metadata.data ? github_metadata.data : null
|
||||
};
|
||||
|
||||
getFileContent = (path, branch = "main") => {
|
||||
return axios
|
||||
.get(
|
||||
`https://raw.githubusercontent.com/${this.owner}/${this.name}/${branch}/${path}`
|
||||
)
|
||||
.then((res) => res.data);
|
||||
};
|
||||
|
||||
serialize() {
|
||||
return JSON.parse(JSON.stringify(this));
|
||||
}
|
||||
|
||||
static async getFromGitHub(owner: string, name: string) {
|
||||
const project = new Project(owner, name);
|
||||
await project.initFromGitHub();
|
||||
|
||||
return project;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,47 +0,0 @@
|
||||
export function convertSimpleToVegaLite(view, resource) {
|
||||
const x = resource.schema.fields.find((f) => f.name === view.spec.group);
|
||||
const y = resource.schema.fields.find((f) => f.name === view.spec.series[0]);
|
||||
|
||||
const xType = inferVegaType(x.type);
|
||||
const yType = inferVegaType(y.type);
|
||||
|
||||
let vegaLiteSpec = {
|
||||
$schema: "https://vega.github.io/schema/vega-lite/v5.json",
|
||||
mark: {
|
||||
type: view.spec.type,
|
||||
color: "black",
|
||||
strokeWidth: 1,
|
||||
tooltip: true,
|
||||
},
|
||||
title: view.title,
|
||||
width: 500,
|
||||
height: 300,
|
||||
selection: {
|
||||
grid: {
|
||||
type: "interval",
|
||||
bind: "scales",
|
||||
},
|
||||
},
|
||||
encoding: {
|
||||
x: {
|
||||
field: x.name,
|
||||
type: xType,
|
||||
},
|
||||
y: {
|
||||
field: y.name,
|
||||
type: yType,
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
return vegaLiteSpec;
|
||||
}
|
||||
|
||||
const inferVegaType = (fieldType) => {
|
||||
switch (fieldType) {
|
||||
case "date":
|
||||
return "Temporal";
|
||||
case "number":
|
||||
return "Quantitative";
|
||||
}
|
||||
};
|
||||
@@ -1,23 +1,7 @@
|
||||
// eslint-disable-next-line @typescript-eslint/no-var-requires
|
||||
const { withNx } = require('@nrwl/next/plugins/with-nx');
|
||||
|
||||
/**
|
||||
* @type {import('@nrwl/next/plugins/with-nx').WithNxOptions}
|
||||
**/
|
||||
const nextConfig = {
|
||||
async rewrites() {
|
||||
return {
|
||||
beforeFiles: [
|
||||
{
|
||||
source: "/@org/:org/:project/:file(\.\+\\\.\.\+\$)",
|
||||
destination:
|
||||
'/api/proxy?url=https://raw.githubusercontent.com/:org/:project/main/:file',
|
||||
},
|
||||
{
|
||||
source: "/@:org/:project/:file(\.\+\\\.\.\+\$)",
|
||||
destination:
|
||||
'/api/proxy?url=https://raw.githubusercontent.com/:org/:project/main/:file',
|
||||
},
|
||||
{
|
||||
source: '/@:org/:project*',
|
||||
destination: '/@org/:org/:project*',
|
||||
@@ -25,11 +9,9 @@ const nextConfig = {
|
||||
],
|
||||
};
|
||||
},
|
||||
nx: {
|
||||
// Set this to true if you would like to use SVGR
|
||||
// See: https://github.com/gregberge/svgr
|
||||
svgr: true,
|
||||
serverRuntimeConfig: {
|
||||
github_pat: process.env.GITHUB_PAT ? process.env.GITHUB_PAT : null,
|
||||
},
|
||||
};
|
||||
|
||||
module.exports = withNx(nextConfig);
|
||||
module.exports = nextConfig;
|
||||
|
||||
5833
examples/simple-example/package-lock.json
generated
Normal file
5833
examples/simple-example/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
@@ -1,7 +1,32 @@
|
||||
{
|
||||
"name": "simple-example",
|
||||
"version": "1.0.0",
|
||||
"description": "",
|
||||
"author": "",
|
||||
"license": "ISC"
|
||||
"name": "my-app",
|
||||
"version": "0.1.0",
|
||||
"private": true,
|
||||
"scripts": {
|
||||
"dev": "next dev",
|
||||
"build": "next build",
|
||||
"start": "next start",
|
||||
"lint": "next lint"
|
||||
},
|
||||
"dependencies": {
|
||||
"@types/node": "18.16.0",
|
||||
"@types/react": "18.0.38",
|
||||
"@types/react-dom": "18.0.11",
|
||||
"eslint": "8.39.0",
|
||||
"eslint-config-next": "13.3.1",
|
||||
"next": "13.3.1",
|
||||
"next-seo": "^6.0.0",
|
||||
"octokit": "^2.0.14",
|
||||
"react": "18.2.0",
|
||||
"react-dom": "18.2.0",
|
||||
"react-markdown": "^8.0.7",
|
||||
"remark-gfm": "^3.0.1",
|
||||
"typescript": "5.0.4"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@tailwindcss/typography": "^0.5.9",
|
||||
"autoprefixer": "^10.4.14",
|
||||
"postcss": "^8.4.23",
|
||||
"tailwindcss": "^3.3.1"
|
||||
}
|
||||
}
|
||||
|
||||
141
examples/simple-example/pages/@org/[org]/[...path].tsx
Normal file
141
examples/simple-example/pages/@org/[org]/[...path].tsx
Normal file
@@ -0,0 +1,141 @@
|
||||
import Head from 'next/head';
|
||||
import { useRouter } from 'next/router';
|
||||
|
||||
import { NextSeo } from 'next-seo';
|
||||
import { promises as fs } from 'fs';
|
||||
import path from 'path';
|
||||
import getConfig from 'next/config';
|
||||
import { getProject, GithubProject } from '../../../lib/octokit';
|
||||
import ReactMarkdown from 'react-markdown';
|
||||
import remarkGfm from 'remark-gfm';
|
||||
import Link from 'next/link';
|
||||
import Footer from '../../../components/Footer';
|
||||
import NavBar from '../../../components/NavBar';
|
||||
|
||||
export default function ProjectPage({ project }) {
|
||||
return (
|
||||
<>
|
||||
<NextSeo
|
||||
title={`PortalJS - @${project.repo_config.owner}/${
|
||||
project.repo_config.repo
|
||||
}${project.base_path !== '/' ? '/' + project.base_path : ''}`}
|
||||
/>
|
||||
<NavBar />
|
||||
<main className="mx-auto my-8 max-w-7xl sm:px-6 lg:px-8 px-4">
|
||||
<div className="prose">
|
||||
<h1 className="mb-0">Data</h1>
|
||||
</div>
|
||||
<div className="inline-block min-w-full py-4 align-middle">
|
||||
<div className="overflow-hidden shadow ring-1 ring-black ring-opacity-5 sm:rounded-lg">
|
||||
<table className="min-w-full divide-y divide-gray-300">
|
||||
<thead className="bg-gray-50">
|
||||
<tr>
|
||||
<th
|
||||
scope="col"
|
||||
className="py-3.5 pl-4 pr-3 text-left text-sm font-semibold text-gray-900 sm:pl-6"
|
||||
>
|
||||
Name
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="px-3 py-3.5 text-left text-sm font-semibold text-gray-900"
|
||||
>
|
||||
Size
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="px-3 py-3.5 text-left text-sm font-semibold text-gray-900"
|
||||
>
|
||||
Download
|
||||
</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-gray-200 bg-white">
|
||||
{project.files.map((file) => (
|
||||
<tr key={file.download_url}>
|
||||
<td className="py-4 pl-4 pr-3 text-sm font-medium text-gray-900 sm:pl-6">
|
||||
{file.name}
|
||||
</td>
|
||||
<td className="whitespace-nowrap px-3 py-4 text-sm text-gray-500">
|
||||
{file.size} Bytes
|
||||
</td>
|
||||
<td className="px-3 py-4 text-sm text-gray-500">
|
||||
<a
|
||||
className="rounded-md bg-indigo-600 no-underline px-2.5 py-1.5 text-sm font-semibold text-white shadow-sm hover:bg-indigo-500 focus-visible:outline focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-indigo-600"
|
||||
href={file.download_url}
|
||||
>
|
||||
Download file
|
||||
</a>
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="prose py-4 max-w-7xl">
|
||||
<h1>Readme</h1>
|
||||
<ReactMarkdown remarkPlugins={[remarkGfm]}>
|
||||
{project.readmeContent}
|
||||
</ReactMarkdown>
|
||||
</div>
|
||||
</main>
|
||||
<Footer />
|
||||
</>
|
||||
);
|
||||
}
|
||||
|
||||
// Generates `/posts/1` and `/posts/2`
|
||||
export async function getStaticPaths() {
|
||||
const jsonDirectory = path.join(process.cwd(), 'datasets.json');
|
||||
const repos = await fs.readFile(jsonDirectory, 'utf8');
|
||||
|
||||
return {
|
||||
paths: JSON.parse(repos).map((repo) => {
|
||||
const projectPath =
|
||||
repo.readme.split('/').length > 1
|
||||
? repo.readme.split('/').slice(0, -1)
|
||||
: null;
|
||||
let path = [repo.repo];
|
||||
if (projectPath) {
|
||||
projectPath.forEach((element) => {
|
||||
path.push(element);
|
||||
});
|
||||
}
|
||||
return {
|
||||
params: { org: repo.owner, path },
|
||||
};
|
||||
}),
|
||||
fallback: false, // can also be true or 'blocking'
|
||||
};
|
||||
}
|
||||
|
||||
export async function getStaticProps({ params }) {
|
||||
const jsonDirectory = path.join(process.cwd(), 'datasets.json');
|
||||
const reposFile = await fs.readFile(jsonDirectory, 'utf8');
|
||||
const repos: GithubProject[] = JSON.parse(reposFile);
|
||||
const repo = repos.find((_repo) => {
|
||||
const projectPath =
|
||||
_repo.readme.split('/').length > 1
|
||||
? _repo.readme.split('/').slice(0, -1)
|
||||
: null;
|
||||
let path = [_repo.repo];
|
||||
if (projectPath) {
|
||||
projectPath.forEach((element) => {
|
||||
path.push(element);
|
||||
});
|
||||
}
|
||||
return (
|
||||
_repo.owner == params.org &&
|
||||
JSON.stringify(path) === JSON.stringify(params.path)
|
||||
);
|
||||
});
|
||||
const github_pat = getConfig().serverRuntimeConfig.github_pat;
|
||||
const project = await getProject(repo, github_pat);
|
||||
return {
|
||||
props: {
|
||||
project: { ...project, repo_config: repo },
|
||||
},
|
||||
};
|
||||
}
|
||||
@@ -1,110 +0,0 @@
|
||||
import Head from 'next/head';
|
||||
import { useRouter } from 'next/router';
|
||||
|
||||
import DRD from '../../../../components/drd/DRD';
|
||||
import parse from '../../../../lib/markdown';
|
||||
import Project from '../../../../lib/project';
|
||||
import { NextSeo } from 'next-seo';
|
||||
import MDLayout from 'examples/simple-example/components/MDLayout';
|
||||
import { promises as fs } from 'fs';
|
||||
import path from 'path';
|
||||
|
||||
function CollectionsLayout({ children, ...frontMatter }) {
|
||||
const { title, date, description } = frontMatter;
|
||||
|
||||
return (
|
||||
<article className="docs prose text-primary dark:text-primary-dark dark:prose-invert prose-headings:font-headings prose-a:break-words mx-auto p-6">
|
||||
<header>
|
||||
<div className="mb-6">
|
||||
{date && (
|
||||
<p className="text-sm text-zinc-400 dark:text-zinc-500">
|
||||
<time dateTime={date}>{date}</time>
|
||||
</p>
|
||||
)}
|
||||
{title && <h1 className="mb-2">{title}</h1>}
|
||||
{description && <p className="text-xl mt-0">{description}</p>}
|
||||
</div>
|
||||
</header>
|
||||
<section>{children}</section>
|
||||
</article>
|
||||
);
|
||||
}
|
||||
|
||||
export default function ProjectPage({
|
||||
mdxSource,
|
||||
frontMatter,
|
||||
excerpt,
|
||||
project,
|
||||
}) {
|
||||
const router = useRouter();
|
||||
|
||||
return (
|
||||
<>
|
||||
<NextSeo title={`PortalJS - @${project.owner}/${project.name}`} />
|
||||
<Head>
|
||||
{/*
|
||||
On index files, add trailling slash to the base path
|
||||
see notes: https://github.com/datopian/datahub-next/issues/69
|
||||
*/}
|
||||
<base href={router.asPath.split('#')[0] + '/'} />
|
||||
</Head>
|
||||
<main>
|
||||
<MDLayout
|
||||
layout={frontMatter.layout}
|
||||
excerpt={excerpt}
|
||||
project={project}
|
||||
{...frontMatter}
|
||||
>
|
||||
<DRD
|
||||
source={mdxSource}
|
||||
frictionless={{
|
||||
views: project.metadata?.views,
|
||||
resources: project.metadata?.resources,
|
||||
}}
|
||||
/>
|
||||
</MDLayout>
|
||||
</main>
|
||||
</>
|
||||
);
|
||||
}
|
||||
|
||||
// Generates `/posts/1` and `/posts/2`
|
||||
export async function getStaticPaths() {
|
||||
const jsonDirectory = path.join(process.cwd(), '/examples/simple-example/datasets.json');
|
||||
const repos = await fs.readFile(jsonDirectory, 'utf8');
|
||||
|
||||
return {
|
||||
paths: JSON.parse(repos).map(repo => ({ params: { org: repo.owner, project: repo.repo}})),
|
||||
fallback: false, // can also be true or 'blocking'
|
||||
}
|
||||
}
|
||||
|
||||
export async function getStaticProps({ params }) {
|
||||
const { org: orgName, project: projectName } = params;
|
||||
|
||||
const project = await Project.getFromGitHub(orgName, projectName);
|
||||
|
||||
// Defaults to README
|
||||
let content = project.readme;
|
||||
|
||||
if (content === null) {
|
||||
return {
|
||||
notFound: true,
|
||||
};
|
||||
}
|
||||
|
||||
let { mdxSource, frontMatter, excerpt } = await parse(content, '.mdx');
|
||||
|
||||
if (project.metadata?.resources) {
|
||||
frontMatter.layout = 'datapackage';
|
||||
}
|
||||
|
||||
return {
|
||||
props: {
|
||||
mdxSource,
|
||||
frontMatter,
|
||||
excerpt,
|
||||
project: project.serialize(),
|
||||
},
|
||||
};
|
||||
}
|
||||
@@ -1,7 +1,6 @@
|
||||
import { AppProps } from 'next/app';
|
||||
import Head from 'next/head';
|
||||
import './styles.css';
|
||||
import "../styles/global.css";
|
||||
|
||||
function CustomApp({ Component, pageProps }: AppProps) {
|
||||
return (
|
||||
|
||||
@@ -1,26 +0,0 @@
|
||||
import axios from "axios";
|
||||
|
||||
export default function handler(req, res) {
|
||||
if (!req.query.url) {
|
||||
res.status(200).send({
|
||||
error: true,
|
||||
info: "No url to proxy in query string i.e. ?url=...",
|
||||
});
|
||||
return;
|
||||
}
|
||||
axios({
|
||||
method: "get",
|
||||
url: req.query.url,
|
||||
responseType: "stream",
|
||||
})
|
||||
.then((resp) => {
|
||||
resp.data.pipe(res);
|
||||
})
|
||||
.catch((err) => {
|
||||
res.status(400).send({
|
||||
error: true,
|
||||
info: err.message,
|
||||
detailed: err,
|
||||
});
|
||||
});
|
||||
}
|
||||
@@ -1,35 +1,19 @@
|
||||
import parse from '../lib/markdown';
|
||||
import Project from '../lib/project';
|
||||
import { promises as fs } from 'fs';
|
||||
import path from 'path';
|
||||
import Link from 'next/link';
|
||||
import { getProject } from '../lib/octokit';
|
||||
import getConfig from 'next/config';
|
||||
import NavBar from '../components/NavBar';
|
||||
import Footer from '../components/Footer';
|
||||
|
||||
export async function getStaticProps() {
|
||||
const jsonDirectory = path.join(
|
||||
process.cwd(),
|
||||
'/examples/simple-example/datasets.json'
|
||||
);
|
||||
const jsonDirectory = path.join(process.cwd(), '/datasets.json');
|
||||
const repos = await fs.readFile(jsonDirectory, 'utf8');
|
||||
const github_pat = getConfig().serverRuntimeConfig.github_pat;
|
||||
|
||||
const projects = await Promise.all(
|
||||
JSON.parse(repos).map(async (repo) => {
|
||||
const project = await Project.getFromGitHub(repo.owner, repo.repo);
|
||||
|
||||
// Defaults to README
|
||||
const content = project.readme ? project.readme : '';
|
||||
|
||||
let { mdxSource, frontMatter, excerpt } = await parse(content, '.mdx');
|
||||
|
||||
if (project.metadata?.resources) {
|
||||
frontMatter.layout = 'datapackage';
|
||||
}
|
||||
|
||||
return {
|
||||
mdxSource,
|
||||
frontMatter,
|
||||
excerpt,
|
||||
project: project.serialize(),
|
||||
};
|
||||
const project = await getProject(repo, github_pat);
|
||||
return { ...project, repo_config: repo };
|
||||
})
|
||||
);
|
||||
return {
|
||||
@@ -44,109 +28,118 @@ const formatter = new Intl.DateTimeFormat('en-US', {
|
||||
month: 'long',
|
||||
day: 'numeric',
|
||||
hour: 'numeric',
|
||||
minute: 'numeric',
|
||||
second: 'numeric',
|
||||
timeZone: 'UTC',
|
||||
});
|
||||
|
||||
export function Datasets({ projects }) {
|
||||
export default function Datasets({ projects }) {
|
||||
return (
|
||||
<div className="bg-white">
|
||||
<div className="mx-auto max-w-7xl px-6 py-16 sm:py-24 lg:px-8">
|
||||
<h2 className="text-2xl font-bold leading-10 tracking-tight text-indigo-500">
|
||||
My Datasets
|
||||
</h2>
|
||||
<p className="mt-6 max-w-2xl text-base leading-7 text-gray-600">
|
||||
Here is a list of all my datasets for easy access and sharing
|
||||
</p>
|
||||
<div className="mt-20">
|
||||
{/*
|
||||
<dl className="space-y-16 sm:grid sm:grid-cols-2 sm:gap-x-6 sm:gap-y-16 sm:space-y-0 lg:grid-cols-3 lg:gap-x-10">
|
||||
{projects.map((project) => (
|
||||
<div>
|
||||
<dt className="text-base font-semibold leading-7 text-gray-900">
|
||||
<Link
|
||||
href={`@${project.project.owner}/${project.project.name}`}
|
||||
>
|
||||
{project.project.owner}/{project.project.name}
|
||||
</Link>
|
||||
</dt>
|
||||
<dt className="text-base font-semibold leading-7 text-indigo-600">
|
||||
<a
|
||||
href={`https://github.com/${project.project.owner}/${project.project.name}`}
|
||||
>
|
||||
Github repo
|
||||
</a>
|
||||
</dt>
|
||||
<dd className="mt-2 text-base leading-7 text-gray-600">
|
||||
{project.excerpt !== '' ? project.excerpt : 'No description'}
|
||||
</dd>
|
||||
</div>
|
||||
))}
|
||||
</dl> */}
|
||||
<>
|
||||
<NavBar />
|
||||
<div className="px-4 sm:px-6 lg:px-8 py-16 max-w-7xl mx-auto">
|
||||
<div className="sm:flex sm:items-center">
|
||||
<div className="sm:flex-auto">
|
||||
<h1 className="text-xl font-semibold leading-6 text-indigo-600">
|
||||
Datasets
|
||||
</h1>
|
||||
<p className="mt-2 text-sm text-gray-700 py-8">
|
||||
Here is a list of all my datasets for easy access and sharing all
|
||||
stored on multiple github accounts and repos and joined together
|
||||
here
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
<div className="mt-8 flow-root">
|
||||
<div className="-mx-4 -my-2 overflow-x-auto sm:-mx-6 lg:-mx-8">
|
||||
<div className="inline-block min-w-full py-2 align-middle sm:px-6 lg:px-8">
|
||||
<table className="min-w-full divide-y divide-gray-300">
|
||||
<thead>
|
||||
<tr>
|
||||
<th
|
||||
scope="col"
|
||||
className="px-3 py-3.5 text-left text-sm font-semibold text-gray-900"
|
||||
>
|
||||
Dataset name
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="px-3 py-3.5 text-left text-sm font-semibold text-gray-900"
|
||||
>
|
||||
Description
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="px-3 py-3.5 text-left text-sm font-semibold text-gray-900"
|
||||
>
|
||||
Last updated
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="relative py-3.5 pl-3 pr-4 sm:pr-0"
|
||||
></th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-gray-200">
|
||||
{projects.map((project) => (
|
||||
<div className="overflow-hidden shadow ring-1 ring-black ring-opacity-5 sm:rounded-lg">
|
||||
<table className="min-w-full divide-y divide-gray-300">
|
||||
<thead className="bg-gray-50">
|
||||
<tr>
|
||||
<td className="whitespace-nowrap px-3 py-4 text-sm text-gray-500">
|
||||
<a href={project.project.repo_metadata.html_url}>
|
||||
{project.project.owner}/{project.project.name}
|
||||
</a>
|
||||
</td>
|
||||
<td className="px-3 py-4 text-sm text-gray-500">
|
||||
{project.project.repo_metadata.description}
|
||||
</td>
|
||||
<td className="whitespace-nowrap px-3 py-4 text-sm text-gray-500">
|
||||
{formatter.format(
|
||||
new Date(project.project.repo_metadata.updated_at)
|
||||
)}
|
||||
</td>
|
||||
<td className="relative whitespace-nowrap py-4 pl-3 pr-4 text-right text-sm font-medium sm:pr-0">
|
||||
<a
|
||||
href={`/@${project.project.owner}/${project.project.name}`}
|
||||
className="text-indigo-600 hover:text-indigo-900"
|
||||
>
|
||||
More info
|
||||
</a>
|
||||
</td>
|
||||
<th
|
||||
scope="col"
|
||||
className="py-3.5 pl-4 pr-3 text-left text-sm font-semibold text-gray-900 sm:pl-6"
|
||||
>
|
||||
Name
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="px-3 py-3.5 text-left text-sm font-semibold text-gray-900"
|
||||
>
|
||||
Repo
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="px-3 py-3.5 text-left text-sm font-semibold text-gray-900"
|
||||
>
|
||||
Description
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="px-3 py-3.5 text-left text-sm font-semibold text-gray-900"
|
||||
>
|
||||
Last updated
|
||||
</th>
|
||||
<th
|
||||
scope="col"
|
||||
className="relative py-3.5 pl-3 pr-4 sm:pr-6"
|
||||
>
|
||||
<span className="sr-only">More info</span>
|
||||
</th>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-gray-200 bg-white">
|
||||
{projects.map((project) => (
|
||||
<tr key={project.id}>
|
||||
<td className="py-4 pl-4 pr-3 text-sm font-medium text-gray-900 sm:pl-6">
|
||||
{project.repo_config.name
|
||||
? project.repo_config.name
|
||||
: project.full_name +
|
||||
(project.base_path === '/'
|
||||
? ''
|
||||
: '/' + project.base_path)}
|
||||
</td>
|
||||
<td className="whitespace-nowrap px-3 py-4 text-sm text-gray-500">
|
||||
<a href={project.html_url}>{project.full_name}</a>
|
||||
</td>
|
||||
<td className="px-3 py-4 text-sm text-gray-500">
|
||||
{project.repo_config.description
|
||||
? project.repo_config.description
|
||||
: project.description}
|
||||
</td>
|
||||
<td className="whitespace-nowrap px-3 py-4 text-sm text-gray-500">
|
||||
{formatter.format(new Date(project.last_updated))}
|
||||
</td>
|
||||
<td className="relative whitespace-nowrap py-4 pl-3 pr-4 text-right text-sm font-medium sm:pr-6">
|
||||
<a
|
||||
href={`/@${project.repo_config.owner}/${
|
||||
project.repo_config.repo
|
||||
}/${
|
||||
project.base_path === '/' ? '' : project.base_path
|
||||
}`}
|
||||
className="text-indigo-600 hover:text-indigo-900"
|
||||
>
|
||||
More info
|
||||
<span className="sr-only">
|
||||
on,
|
||||
{project.repo_config.name
|
||||
? project.repo_config.name
|
||||
: project.full_name +
|
||||
(project.base_path === '/'
|
||||
? ''
|
||||
: '/' + project.base_path)}
|
||||
</span>
|
||||
</a>
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<Footer />
|
||||
</>
|
||||
);
|
||||
}
|
||||
|
||||
export default Datasets;
|
||||
|
||||
@@ -1,6 +1,11 @@
|
||||
@tailwind base;
|
||||
@tailwind components;
|
||||
@tailwind utilities;
|
||||
|
||||
.prose {
|
||||
--tw-prose-headings: #4F46E5 !important;
|
||||
}
|
||||
|
||||
html {
|
||||
-webkit-text-size-adjust: 100%;
|
||||
font-family: ui-sans-serif, system-ui, -apple-system, BlinkMacSystemFont,
|
||||
@@ -62,9 +67,6 @@ pre {
|
||||
box-shadow: 0 0 #0000, 0 0 #0000, 0 10px 15px -3px rgba(0, 0, 0, 0.1),
|
||||
0 4px 6px -2px rgba(0, 0, 0, 0.05);
|
||||
}
|
||||
.rounded {
|
||||
border-radius: 1.5rem;
|
||||
}
|
||||
.wrapper {
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
@@ -1,15 +1,6 @@
|
||||
const { join } = require('path');
|
||||
|
||||
// Note: If you use library-specific PostCSS/Tailwind configuration then you should remove the `postcssConfig` build
|
||||
// option from your application's configuration (i.e. project.json).
|
||||
//
|
||||
// See: https://nx.dev/guides/using-tailwind-css-in-react#step-4:-applying-configuration-to-libraries
|
||||
|
||||
module.exports = {
|
||||
plugins: {
|
||||
tailwindcss: {
|
||||
config: join(__dirname, 'tailwind.config.js'),
|
||||
},
|
||||
tailwindcss: {},
|
||||
autoprefixer: {},
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
@@ -1,69 +0,0 @@
|
||||
{
|
||||
"name": "simple-example",
|
||||
"$schema": "../../node_modules/nx/schemas/project-schema.json",
|
||||
"sourceRoot": "examples/simple-example",
|
||||
"projectType": "application",
|
||||
"targets": {
|
||||
"build": {
|
||||
"executor": "@nrwl/next:build",
|
||||
"outputs": ["{options.outputPath}"],
|
||||
"defaultConfiguration": "production",
|
||||
"options": {
|
||||
"root": "examples/simple-example",
|
||||
"outputPath": "dist/examples/simple-example"
|
||||
},
|
||||
"configurations": {
|
||||
"development": {
|
||||
"outputPath": "examples/simple-example"
|
||||
},
|
||||
"production": {}
|
||||
}
|
||||
},
|
||||
"serve": {
|
||||
"executor": "@nrwl/next:server",
|
||||
"defaultConfiguration": "development",
|
||||
"options": {
|
||||
"buildTarget": "simple-example:build",
|
||||
"dev": true
|
||||
},
|
||||
"configurations": {
|
||||
"development": {
|
||||
"buildTarget": "simple-example:build:development",
|
||||
"dev": true
|
||||
},
|
||||
"production": {
|
||||
"buildTarget": "simple-example:build:production",
|
||||
"dev": false
|
||||
}
|
||||
}
|
||||
},
|
||||
"export": {
|
||||
"executor": "@nrwl/next:export",
|
||||
"options": {
|
||||
"buildTarget": "simple-example:build:production"
|
||||
}
|
||||
},
|
||||
"test": {
|
||||
"executor": "@nrwl/jest:jest",
|
||||
"outputs": ["{workspaceRoot}/coverage/{projectRoot}"],
|
||||
"options": {
|
||||
"jestConfig": "examples/simple-example/jest.config.ts",
|
||||
"passWithNoTests": true
|
||||
},
|
||||
"configurations": {
|
||||
"ci": {
|
||||
"ci": true,
|
||||
"codeCoverage": true
|
||||
}
|
||||
}
|
||||
},
|
||||
"lint": {
|
||||
"executor": "@nrwl/linter:eslint",
|
||||
"outputs": ["{options.outputFile}"],
|
||||
"options": {
|
||||
"lintFilePatterns": ["examples/simple-example/**/*.{ts,tsx,js,jsx}"]
|
||||
}
|
||||
}
|
||||
},
|
||||
"tags": []
|
||||
}
|
||||
BIN
examples/simple-example/public/logo.png
Normal file
BIN
examples/simple-example/public/logo.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 33 KiB |
@@ -1,11 +0,0 @@
|
||||
import React from 'react';
|
||||
import { render } from '@testing-library/react';
|
||||
|
||||
import Index from '../pages/index';
|
||||
|
||||
describe('Index', () => {
|
||||
it('should render successfully', () => {
|
||||
const { baseElement } = render(<Index />);
|
||||
expect(baseElement).toBeTruthy();
|
||||
});
|
||||
});
|
||||
@@ -1,67 +0,0 @@
|
||||
@import "@flowershow/remark-callouts/styles.css";
|
||||
|
||||
/* mathjax */
|
||||
.math-inline > mjx-container > svg {
|
||||
display: inline;
|
||||
align-items: center;
|
||||
}
|
||||
|
||||
/* smooth scrolling in modern browsers */
|
||||
html {
|
||||
scroll-behavior: smooth !important;
|
||||
}
|
||||
|
||||
/* tooltip fade-out clip */
|
||||
.tooltip-body::after {
|
||||
content: "";
|
||||
position: absolute;
|
||||
right: 0;
|
||||
top: 3.6rem; /* multiple of $line-height used on the tooltip body (defined in tooltipBodyStyle) */
|
||||
height: 1.2rem; /* ($top + $height)/$line-height is the number of lines we want to clip tooltip text at*/
|
||||
width: 10rem;
|
||||
background: linear-gradient(
|
||||
to right,
|
||||
rgba(255, 255, 255, 0),
|
||||
rgba(255, 255, 255, 1) 100%
|
||||
);
|
||||
}
|
||||
|
||||
:is(h2, h3, h4, h5, h6):not(.blogitem-title) {
|
||||
margin-left: -2rem !important;
|
||||
padding-left: 2rem !important;
|
||||
scroll-margin-top: 4.5rem;
|
||||
position: relative;
|
||||
}
|
||||
|
||||
.heading-link {
|
||||
padding: 1px;
|
||||
position: absolute;
|
||||
left: 0;
|
||||
top: 50%;
|
||||
transform: translateY(-50%);
|
||||
margin: auto 0;
|
||||
border-radius: 5px;
|
||||
background: #1e293b;
|
||||
opacity: 0;
|
||||
transition: opacity 0.2s;
|
||||
}
|
||||
|
||||
.light .heading-link {
|
||||
/* border: 1px solid #ab2b65; */
|
||||
/* background: none; */
|
||||
background: #e2e8f0;
|
||||
}
|
||||
|
||||
:is(h2, h3, h4, h5, h6):not(.blogitem-title):hover .heading-link {
|
||||
opacity: 100;
|
||||
}
|
||||
|
||||
.heading-link svg {
|
||||
transform: scale(0.75);
|
||||
}
|
||||
|
||||
@media screen and (max-width: 640px) {
|
||||
.heading-link {
|
||||
visibility: hidden;
|
||||
}
|
||||
}
|
||||
@@ -1,21 +1,15 @@
|
||||
const { createGlobPatternsForDependencies } = require('@nrwl/react/tailwind');
|
||||
const { join } = require('path');
|
||||
|
||||
/** @type {import('tailwindcss').Config} */
|
||||
module.exports = {
|
||||
content: [
|
||||
"node_modules/@flowershow/core/dist/*.js",
|
||||
"node_modules/@flowershow/core/*.js",
|
||||
join(
|
||||
__dirname,
|
||||
'{src,pages,components}/**/*!(*.stories|*.spec).{ts,tsx,html}'
|
||||
),
|
||||
...createGlobPatternsForDependencies(__dirname),
|
||||
"./app/**/*.{js,ts,jsx,tsx,mdx}",
|
||||
"./pages/**/*.{js,ts,jsx,tsx,mdx}",
|
||||
"./components/**/*.{js,ts,jsx,tsx,mdx}",
|
||||
],
|
||||
theme: {
|
||||
extend: {},
|
||||
},
|
||||
plugins: [
|
||||
require('@tailwindcss/typography'),
|
||||
require('@tailwindcss/typography')
|
||||
],
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
@@ -1,50 +1,20 @@
|
||||
{
|
||||
"extends": "../../tsconfig.base.json",
|
||||
"compilerOptions": {
|
||||
"jsx": "preserve",
|
||||
"target": "es5",
|
||||
"lib": ["dom", "dom.iterable", "esnext"],
|
||||
"allowJs": true,
|
||||
"esModuleInterop": true,
|
||||
"allowSyntheticDefaultImports": true,
|
||||
"skipLibCheck": true,
|
||||
"strict": false,
|
||||
"forceConsistentCasingInFileNames": true,
|
||||
"noEmit": true,
|
||||
"esModuleInterop": true,
|
||||
"module": "esnext",
|
||||
"moduleResolution": "node",
|
||||
"resolveJsonModule": true,
|
||||
"isolatedModules": true,
|
||||
"incremental": true,
|
||||
"types": [
|
||||
"jest",
|
||||
"node"
|
||||
]
|
||||
"jsx": "preserve",
|
||||
"incremental": true
|
||||
},
|
||||
"target": "es2020",
|
||||
"lib": [
|
||||
"dom",
|
||||
"dom.iterable",
|
||||
"esnext"
|
||||
],
|
||||
"allowJs": true,
|
||||
"skipLibCheck": true,
|
||||
"strict": false,
|
||||
"forceConsistentCasingInFileNames": true,
|
||||
"noEmit": true,
|
||||
"incremental": true,
|
||||
"esModuleInterop": true,
|
||||
"module": "esnext",
|
||||
"moduleResolution": "node",
|
||||
"resolveJsonModule": true,
|
||||
"isolatedModules": true,
|
||||
"jsx": "preserve",
|
||||
"include": [
|
||||
"**/*.ts",
|
||||
"**/*.tsx",
|
||||
"**/*.js",
|
||||
"**/*.jsx",
|
||||
"next-env.d.ts"
|
||||
],
|
||||
"exclude": [
|
||||
"node_modules",
|
||||
"jest.config.ts",
|
||||
"src/**/*.spec.ts",
|
||||
"src/**/*.test.ts"
|
||||
]
|
||||
"include": ["next-env.d.ts", "**/*.ts", "**/*.tsx"],
|
||||
"exclude": ["node_modules"]
|
||||
}
|
||||
|
||||
@@ -1,24 +0,0 @@
|
||||
{
|
||||
"extends": "./tsconfig.json",
|
||||
"compilerOptions": {
|
||||
"outDir": "../../dist/out-tsc",
|
||||
"module": "commonjs",
|
||||
"types": ["jest", "node"],
|
||||
"jsx": "react"
|
||||
},
|
||||
"paths": {
|
||||
"@/*": ["./*"]
|
||||
},
|
||||
"include": [
|
||||
"jest.config.ts",
|
||||
"src/**/*.test.ts",
|
||||
"src/**/*.spec.ts",
|
||||
"src/**/*.test.tsx",
|
||||
"src/**/*.spec.tsx",
|
||||
"src/**/*.test.js",
|
||||
"src/**/*.spec.js",
|
||||
"src/**/*.test.jsx",
|
||||
"src/**/*.spec.jsx",
|
||||
"src/**/*.d.ts"
|
||||
]
|
||||
}
|
||||
65
package-lock.json
generated
65
package-lock.json
generated
@@ -44,6 +44,7 @@
|
||||
"prop-types": "^15.8.1",
|
||||
"react": "18.2.0",
|
||||
"react-dom": "18.2.0",
|
||||
"react-markdown": "^8.0.7",
|
||||
"react-next-github-btn": "^1.2.1",
|
||||
"react-plotly.js": "^2.6.0",
|
||||
"react-plotlyjs": "^0.4.4",
|
||||
@@ -29695,6 +29696,41 @@
|
||||
"integrity": "sha512-w2GsyukL62IJnlaff/nRegPQR94C/XXamvMWmSHRJ4y7Ts/4ocGRmTHvOs8PSE6pB3dWOrD/nueuU5sduBsQ4w==",
|
||||
"dev": true
|
||||
},
|
||||
"node_modules/react-markdown": {
|
||||
"version": "8.0.7",
|
||||
"resolved": "https://registry.npmjs.org/react-markdown/-/react-markdown-8.0.7.tgz",
|
||||
"integrity": "sha512-bvWbzG4MtOU62XqBx3Xx+zB2raaFFsq4mYiAzfjXJMEz2sixgeAfraA3tvzULF02ZdOMUOKTBFFaZJDDrq+BJQ==",
|
||||
"dependencies": {
|
||||
"@types/hast": "^2.0.0",
|
||||
"@types/prop-types": "^15.0.0",
|
||||
"@types/unist": "^2.0.0",
|
||||
"comma-separated-tokens": "^2.0.0",
|
||||
"hast-util-whitespace": "^2.0.0",
|
||||
"prop-types": "^15.0.0",
|
||||
"property-information": "^6.0.0",
|
||||
"react-is": "^18.0.0",
|
||||
"remark-parse": "^10.0.0",
|
||||
"remark-rehype": "^10.0.0",
|
||||
"space-separated-tokens": "^2.0.0",
|
||||
"style-to-object": "^0.4.0",
|
||||
"unified": "^10.0.0",
|
||||
"unist-util-visit": "^4.0.0",
|
||||
"vfile": "^5.0.0"
|
||||
},
|
||||
"funding": {
|
||||
"type": "opencollective",
|
||||
"url": "https://opencollective.com/unified"
|
||||
},
|
||||
"peerDependencies": {
|
||||
"@types/react": ">=16",
|
||||
"react": ">=16"
|
||||
}
|
||||
},
|
||||
"node_modules/react-markdown/node_modules/react-is": {
|
||||
"version": "18.2.0",
|
||||
"resolved": "https://registry.npmjs.org/react-is/-/react-is-18.2.0.tgz",
|
||||
"integrity": "sha512-xWGDIW6x921xtzPkhiULtthJHoJvBbF3q26fzloPCK0hsvxtPVelvftw3zjbHWSkR2km9Z+4uxbDDK/6Zw9B8w=="
|
||||
},
|
||||
"node_modules/react-next-github-btn": {
|
||||
"version": "1.2.1",
|
||||
"resolved": "https://registry.npmjs.org/react-next-github-btn/-/react-next-github-btn-1.2.1.tgz",
|
||||
@@ -57219,6 +57255,35 @@
|
||||
"integrity": "sha512-w2GsyukL62IJnlaff/nRegPQR94C/XXamvMWmSHRJ4y7Ts/4ocGRmTHvOs8PSE6pB3dWOrD/nueuU5sduBsQ4w==",
|
||||
"dev": true
|
||||
},
|
||||
"react-markdown": {
|
||||
"version": "8.0.7",
|
||||
"resolved": "https://registry.npmjs.org/react-markdown/-/react-markdown-8.0.7.tgz",
|
||||
"integrity": "sha512-bvWbzG4MtOU62XqBx3Xx+zB2raaFFsq4mYiAzfjXJMEz2sixgeAfraA3tvzULF02ZdOMUOKTBFFaZJDDrq+BJQ==",
|
||||
"requires": {
|
||||
"@types/hast": "^2.0.0",
|
||||
"@types/prop-types": "^15.0.0",
|
||||
"@types/unist": "^2.0.0",
|
||||
"comma-separated-tokens": "^2.0.0",
|
||||
"hast-util-whitespace": "^2.0.0",
|
||||
"prop-types": "^15.0.0",
|
||||
"property-information": "^6.0.0",
|
||||
"react-is": "^18.0.0",
|
||||
"remark-parse": "^10.0.0",
|
||||
"remark-rehype": "^10.0.0",
|
||||
"space-separated-tokens": "^2.0.0",
|
||||
"style-to-object": "^0.4.0",
|
||||
"unified": "^10.0.0",
|
||||
"unist-util-visit": "^4.0.0",
|
||||
"vfile": "^5.0.0"
|
||||
},
|
||||
"dependencies": {
|
||||
"react-is": {
|
||||
"version": "18.2.0",
|
||||
"resolved": "https://registry.npmjs.org/react-is/-/react-is-18.2.0.tgz",
|
||||
"integrity": "sha512-xWGDIW6x921xtzPkhiULtthJHoJvBbF3q26fzloPCK0hsvxtPVelvftw3zjbHWSkR2km9Z+4uxbDDK/6Zw9B8w=="
|
||||
}
|
||||
}
|
||||
},
|
||||
"react-next-github-btn": {
|
||||
"version": "1.2.1",
|
||||
"resolved": "https://registry.npmjs.org/react-next-github-btn/-/react-next-github-btn-1.2.1.tgz",
|
||||
|
||||
60
package.json
60
package.json
@@ -4,62 +4,7 @@
|
||||
"license": "MIT",
|
||||
"scripts": {},
|
||||
"private": true,
|
||||
"dependencies": {
|
||||
"@apollo/client": "^3.7.11",
|
||||
"@apollo/react-hooks": "^4.0.0",
|
||||
"@emotion/react": "^11.10.6",
|
||||
"@emotion/styled": "^11.10.6",
|
||||
"@flowershow/core": "^0.4.9",
|
||||
"@flowershow/markdowndb": "^0.1.0",
|
||||
"@flowershow/remark-callouts": "^1.0.0",
|
||||
"@flowershow/remark-embed": "^1.0.0",
|
||||
"@flowershow/remark-wiki-link": "^1.0.1",
|
||||
"@headlessui/react": "^1.7.13",
|
||||
"@heroicons/react": "^2.0.17",
|
||||
"@mui/icons-material": "^5.11.16",
|
||||
"@mui/material": "^5.11.16",
|
||||
"@mui/x-data-grid": "^6.1.0",
|
||||
"@opentelemetry/api": "^1.4.0",
|
||||
"@tailwindcss/typography": "^0.5.9",
|
||||
"@tanstack/react-table": "^8.8.5",
|
||||
"apollo-cache-inmemory": "^1.6.6",
|
||||
"apollo-link": "^1.2.14",
|
||||
"apollo-link-rest": "^0.9.0",
|
||||
"filesize": "^10.0.7",
|
||||
"gray-matter": "^4.0.3",
|
||||
"html-react-parser": "^3.0.15",
|
||||
"markdown-it": "^13.0.1",
|
||||
"next": "^13.2.1",
|
||||
"next-mdx-remote": "^4.4.1",
|
||||
"next-seo": "^6.0.0",
|
||||
"next-translate": "^2.0.5",
|
||||
"nock": "^13.3.0",
|
||||
"octokit": "^2.0.14",
|
||||
"papaparse": "^5.4.1",
|
||||
"plotly.js-basic-dist": "^2.20.0",
|
||||
"prop-types": "^15.8.1",
|
||||
"react": "18.2.0",
|
||||
"react-dom": "18.2.0",
|
||||
"react-next-github-btn": "^1.2.1",
|
||||
"react-plotly.js": "^2.6.0",
|
||||
"react-plotlyjs": "^0.4.4",
|
||||
"react-vega": "^7.6.0",
|
||||
"rehype-autolink-headings": "^6.1.1",
|
||||
"rehype-katex": "^6.0.2",
|
||||
"rehype-prism-plus": "^1.5.1",
|
||||
"rehype-slug": "^5.1.0",
|
||||
"remark-footnotes": "^4.0.1",
|
||||
"remark-gfm": "^3.0.1",
|
||||
"remark-math": "^5.1.1",
|
||||
"remark-slug": "^7.0.1",
|
||||
"remark-smartypants": "^2.0.0",
|
||||
"remark-toc": "^8.0.1",
|
||||
"slugify": "^1.6.6",
|
||||
"timeago.js": "^4.0.2",
|
||||
"tslib": "^2.3.0",
|
||||
"vega": "^5.24.0",
|
||||
"xlsx": "^0.18.5"
|
||||
},
|
||||
"dependencies": {},
|
||||
"devDependencies": {
|
||||
"@babel/preset-react": "^7.14.5",
|
||||
"@nrwl/cypress": "15.9.2",
|
||||
@@ -83,7 +28,6 @@
|
||||
"@types/react-dom": "18.0.11",
|
||||
"@typescript-eslint/eslint-plugin": "^5.36.1",
|
||||
"@typescript-eslint/parser": "^5.36.1",
|
||||
"autoprefixer": "10.4.13",
|
||||
"babel-jest": "^29.4.1",
|
||||
"cypress": "^12.2.0",
|
||||
"eslint": "~8.15.0",
|
||||
@@ -97,11 +41,9 @@
|
||||
"jest": "^29.4.1",
|
||||
"jest-environment-jsdom": "^29.4.1",
|
||||
"nx": "15.9.2",
|
||||
"postcss": "8.4.21",
|
||||
"prettier": "^2.6.2",
|
||||
"react-test-renderer": "18.2.0",
|
||||
"swc-loader": "0.1.15",
|
||||
"tailwindcss": "3.2.7",
|
||||
"ts-jest": "^29.0.5",
|
||||
"ts-node": "10.9.1",
|
||||
"typescript": "~4.9.5"
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
This the Portal.JS website.
|
||||
This the PortalJS website.
|
||||
|
||||
It is built on [Next.js](https://nextjs.org/).
|
||||
|
||||
|
||||
26
site/components/ButtonLink.tsx
Normal file
26
site/components/ButtonLink.tsx
Normal file
@@ -0,0 +1,26 @@
|
||||
import Link from 'next/link';
|
||||
|
||||
export default function ButtonLink({
|
||||
style = 'primary',
|
||||
className = '',
|
||||
href = '',
|
||||
children,
|
||||
}) {
|
||||
let styleClassName = '';
|
||||
|
||||
if (style == 'primary') {
|
||||
styleClassName = 'text-primary bg-blue-400 hover:bg-blue-300';
|
||||
} else if (style == 'secondary') {
|
||||
styleClassName =
|
||||
'text-secondary border !border-secondary hover:text-primary hover:bg-blue-300';
|
||||
}
|
||||
|
||||
return (
|
||||
<Link
|
||||
href={href}
|
||||
className={`inline-block h-12 px-6 py-3 border border-transparent text-base font-medium rounded-md focus:outline-none focus-visible:outline-2 focus-visible:outline-offset-2 focus-visible:outline-sky-300/50 active:bg-sky-500 ${styleClassName} ${className}`}
|
||||
>
|
||||
{children}
|
||||
</Link>
|
||||
);
|
||||
}
|
||||
99
site/components/Community.tsx
Normal file
99
site/components/Community.tsx
Normal file
@@ -0,0 +1,99 @@
|
||||
import Container from './Container';
|
||||
import DiscordIcon from './icons/DiscordIcon';
|
||||
import EmailIcon from './icons/EmailIcon';
|
||||
import GitHubIcon from './icons/GitHubIcon';
|
||||
|
||||
import { siteConfig } from '@/config/siteConfig';
|
||||
import { getContributorsCount, getRepoInfo } from '@/lib/getGitHubData';
|
||||
import { useEffect, useState } from 'react';
|
||||
|
||||
const Stat = ({ title, value, ...props }) => {
|
||||
return (
|
||||
<div {...props}>
|
||||
<span className="text-6xl font-bold text-secondary">{value}</span>
|
||||
<p className="text-lg font-medium">{title}</p>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
const IconButton = ({ Icon, text, href, ...props }) => {
|
||||
return (
|
||||
<div {...props}>
|
||||
<a
|
||||
className="rounded border border-secondary px-5 py-3 text-primary dark:text-primary-dark flex items-center hover:bg-secondary hover:text-primary dark:hover:text-primary transition-all duration-200"
|
||||
href={href}
|
||||
>
|
||||
<Icon className="w-6 h-6 mr-2" />
|
||||
{text}
|
||||
</a>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
export default function Community() {
|
||||
const [repoInfo, setRepoInfo] = useState<any>();
|
||||
const [contributorsCount, setContributorsCount] = useState('');
|
||||
|
||||
useEffect(() => {
|
||||
// This runs on client side and it's unlikely that users
|
||||
// will exceed the GitHub API usage limit, but added a
|
||||
// handling for that just in case.
|
||||
|
||||
getRepoInfo().then((res) => {
|
||||
if (res.success) {
|
||||
res.info.then((data) => setRepoInfo(data));
|
||||
} else {
|
||||
// If the request fail e.g API usage limit, use
|
||||
// a placeholder
|
||||
setRepoInfo({ stargazers_count: '+2k' });
|
||||
}
|
||||
});
|
||||
|
||||
getContributorsCount().then((res) => {
|
||||
if (res.success) {
|
||||
setContributorsCount(res.count);
|
||||
} else {
|
||||
setContributorsCount('+70');
|
||||
}
|
||||
});
|
||||
}, []);
|
||||
|
||||
return (
|
||||
<Container>
|
||||
<h2 className="text-3xl font-bold text-primary dark:text-primary-dark ">
|
||||
Community
|
||||
</h2>
|
||||
<p className="text-lg mt-8 ">
|
||||
We are growing. Get in touch or become a contributor!
|
||||
</p>
|
||||
<div className="flex justify-center mt-12">
|
||||
<Stat
|
||||
title="Stars on GitHub"
|
||||
value={repoInfo?.stargazers_count}
|
||||
className="mr-10"
|
||||
/>
|
||||
<Stat title="Contributors" value={contributorsCount} />
|
||||
</div>
|
||||
<div className="flex flex-wrap justify-center mt-12">
|
||||
<IconButton
|
||||
Icon={GitHubIcon}
|
||||
text="Star PortalJS on GitHub"
|
||||
className="sm:mr-4 mb-4 w-full sm:w-auto"
|
||||
href={siteConfig.github}
|
||||
/>
|
||||
<IconButton
|
||||
Icon={DiscordIcon}
|
||||
text="Join the Discord server"
|
||||
className="sm:mr-4 mb-4 w-full sm:w-auto"
|
||||
href={siteConfig.discord}
|
||||
/>
|
||||
<IconButton
|
||||
Icon={EmailIcon}
|
||||
text="Subscribe to the PortalJS newsletter"
|
||||
className="w-full sm:w-auto"
|
||||
href="#hero"
|
||||
/>
|
||||
</div>
|
||||
</Container>
|
||||
);
|
||||
}
|
||||
7
site/components/Container.tsx
Normal file
7
site/components/Container.tsx
Normal file
@@ -0,0 +1,7 @@
|
||||
export default function Container({ children }) {
|
||||
return (
|
||||
<div className="lg:max-w-8xl mx-auto px-4 lg:px-8 xl:px-12 mb-32">
|
||||
{children}
|
||||
</div>
|
||||
);
|
||||
}
|
||||
@@ -1,3 +1,5 @@
|
||||
import Container from './Container';
|
||||
|
||||
const features: { title: string; description: string; icon: string }[] = [
|
||||
{
|
||||
title: 'Unified sites',
|
||||
@@ -37,10 +39,12 @@ const features: { title: string; description: string; icon: string }[] = [
|
||||
|
||||
export default function Features() {
|
||||
return (
|
||||
<div className="lg:max-w-8xl mx-auto px-4 lg:px-8 xl:px-12">
|
||||
<h2 className="text-3xl font-bold">How Portal.JS works?</h2>
|
||||
<Container>
|
||||
<h2 className="text-3xl font-bold text-primary dark:text-primary-dark">
|
||||
How PortalJS works?
|
||||
</h2>
|
||||
<p className="text-lg mt-8">
|
||||
Portal.JS is built in JavaScript and React on top of the popular Next.js
|
||||
PortalJS is built in JavaScript and React on top of the popular Next.js
|
||||
framework, assuming a "decoupled" approach where the frontend is a
|
||||
separate service from the backend and interacts with backend(s) via an
|
||||
API. It can be used with any backend and has out of the box support for
|
||||
@@ -55,7 +59,7 @@ export default function Features() {
|
||||
<div className="absolute -inset-px rounded-xl border-2 border-transparent opacity-0 [background:linear-gradient(var(--quick-links-hover-bg,theme(colors.sky.50)),var(--quick-links-hover-bg,theme(colors.sky.50)))_padding-box,linear-gradient(to_top,theme(colors.blue.300),theme(colors.blue.400),theme(colors.blue.500))_border-box] group-hover:opacity-100 dark:[--quick-links-hover-bg:theme(colors.slate.800)]" />
|
||||
<div className="relative overflow-hidden rounded-xl p-6">
|
||||
<img src={feature.icon} alt="" className="h-24 w-auto" />
|
||||
<h2 className="mt-4 font-display text-base text-slate-900 dark:text-white">
|
||||
<h2 className="mt-4 font-display text-base text-primary dark:text-primary-dark">
|
||||
<span className="absolute -inset-px rounded-xl" />
|
||||
{feature.title}
|
||||
</h2>
|
||||
@@ -66,6 +70,6 @@ export default function Features() {
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
</Container>
|
||||
);
|
||||
}
|
||||
|
||||
73
site/components/Gallery.tsx
Normal file
73
site/components/Gallery.tsx
Normal file
@@ -0,0 +1,73 @@
|
||||
import Container from './Container';
|
||||
import GalleryItem from './GalleryItem';
|
||||
|
||||
const items = [
|
||||
{
|
||||
title: 'Open Data Northern Ireland',
|
||||
href: 'https://www.opendatani.gov.uk/',
|
||||
image: '/images/showcases/odni.png',
|
||||
description: 'Government Open Data Portal',
|
||||
},
|
||||
{
|
||||
title: 'Birmingham City Observatory',
|
||||
href: 'https://www.cityobservatory.birmingham.gov.uk/',
|
||||
image: '/images/showcases/birmingham.png',
|
||||
description: 'Government Open Data Portal',
|
||||
},
|
||||
{
|
||||
title: 'UAE Open Data',
|
||||
href: 'https://opendata.fcsc.gov.ae/',
|
||||
image: '/images/showcases/uae.png',
|
||||
description: 'Government Open Data Portal',
|
||||
sourceUrl: 'https://github.com/FCSCOpendata/frontend',
|
||||
},
|
||||
{
|
||||
title: 'Brazil Open Data',
|
||||
href: 'https://dados.gov.br/',
|
||||
image: '/images/showcases/brazil.png',
|
||||
description: 'Government Open Data Portal',
|
||||
},
|
||||
{
|
||||
title: 'Datahub Open Data',
|
||||
href: 'https://opendata.datahub.io/',
|
||||
image: '/images/showcases/datahub.png',
|
||||
description: 'Demo Data Portal by DataHub',
|
||||
},
|
||||
{
|
||||
title: 'Example: Simple Data Catalog',
|
||||
href: 'https://example.portaljs.org/',
|
||||
image: '/images/showcases/example-simple-catalog.png',
|
||||
description: 'Simple data catalog',
|
||||
sourceUrl:
|
||||
'https://github.com/datopian/portaljs/tree/main/examples/simple-example',
|
||||
docsUrl: '/docs/example-data-catalog',
|
||||
},
|
||||
{
|
||||
title: 'Example: Portal with CKAN',
|
||||
href: 'https://ckan-example.portaljs.org/',
|
||||
image: '/images/showcases/example-ckan.png',
|
||||
description: 'Simple portal with data coming from CKAN',
|
||||
sourceUrl:
|
||||
'https://github.com/datopian/portaljs/tree/main/examples/ckan-example',
|
||||
docsUrl: '/docs/example-ckan',
|
||||
},
|
||||
];
|
||||
|
||||
export default function Gallery() {
|
||||
return (
|
||||
<Container>
|
||||
<h2
|
||||
className="text-3xl font-bold text-primary dark:text-primary-dark"
|
||||
id="gallery"
|
||||
>
|
||||
Gallery
|
||||
</h2>
|
||||
<p className="text-lg mt-8">Discover what's being powered by PortalJS</p>
|
||||
<div className="not-prose my-12 grid grid-cols-1 gap-6 md:grid-cols-2 lg:grid-cols-3">
|
||||
{items.map((item) => {
|
||||
return <GalleryItem item={item} />;
|
||||
})}
|
||||
</div>
|
||||
</Container>
|
||||
);
|
||||
}
|
||||
106
site/components/GalleryItem.tsx
Normal file
106
site/components/GalleryItem.tsx
Normal file
@@ -0,0 +1,106 @@
|
||||
const IconBeaker = () => (
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
fill="none"
|
||||
viewBox="0 0 24 24"
|
||||
stroke-width="1.5"
|
||||
stroke="currentColor"
|
||||
className="w-6 h-6"
|
||||
>
|
||||
<path
|
||||
stroke-linecap="round"
|
||||
stroke-linejoin="round"
|
||||
d="M9.75 3.104v5.714a2.25 2.25 0 01-.659 1.591L5 14.5M9.75 3.104c-.251.023-.501.05-.75.082m.75-.082a24.301 24.301 0 014.5 0m0 0v5.714c0 .597.237 1.17.659 1.591L19.8 15.3M14.25 3.104c.251.023.501.05.75.082M19.8 15.3l-1.57.393A9.065 9.065 0 0112 15a9.065 9.065 0 00-6.23-.693L5 14.5m14.8.8l1.402 1.402c1.232 1.232.65 3.318-1.067 3.611A48.309 48.309 0 0112 21c-2.773 0-5.491-.235-8.135-.687-1.718-.293-2.3-2.379-1.067-3.61L5 14.5"
|
||||
/>
|
||||
</svg>
|
||||
);
|
||||
|
||||
const IconDocs = () => (
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
fill="none"
|
||||
viewBox="0 0 24 24"
|
||||
strokeWidth={1.5}
|
||||
stroke="currentColor"
|
||||
className="w-6 h-6"
|
||||
>
|
||||
<path
|
||||
strokeLinecap="round"
|
||||
strokeLinejoin="round"
|
||||
d="M12 6.042A8.967 8.967 0 006 3.75c-1.052 0-2.062.18-3 .512v14.25A8.987 8.987 0 016 18c2.305 0 4.408.867 6 2.292m0-14.25a8.966 8.966 0 016-2.292c1.052 0 2.062.18 3 .512v14.25A8.987 8.987 0 0018 18a8.967 8.967 0 00-6 2.292m0-14.25v14.25"
|
||||
/>
|
||||
</svg>
|
||||
);
|
||||
|
||||
const IconCode = () => (
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
fill="none"
|
||||
viewBox="0 0 24 24"
|
||||
strokeWidth={1.5}
|
||||
stroke="currentColor"
|
||||
className="w-6 h-6"
|
||||
>
|
||||
<path
|
||||
strokeLinecap="round"
|
||||
strokeLinejoin="round"
|
||||
d="M17.25 6.75L22.5 12l-5.25 5.25m-10.5 0L1.5 12l5.25-5.25m7.5-3l-4.5 16.5"
|
||||
/>
|
||||
</svg>
|
||||
);
|
||||
|
||||
const ActionButton = ({ title, Icon, href, className = '' }) => (
|
||||
<a
|
||||
title={title}
|
||||
target="_blank"
|
||||
href={href}
|
||||
className={`rounded-full p-2 hover:bg-secondary transition-all duration-250 ${className}`}
|
||||
>
|
||||
<Icon />
|
||||
</a>
|
||||
);
|
||||
|
||||
export default function GalleryItem({ item }) {
|
||||
return (
|
||||
<a
|
||||
className="rounded overflow-hidden group relative border-1 shadow-lg"
|
||||
target="_blank"
|
||||
href={item.href}
|
||||
>
|
||||
<div
|
||||
className="bg-cover bg-no-repeat bg-top aspect-video w-full group-hover:blur-sm group-hover:scale-105 transition-all duration-200"
|
||||
style={{ backgroundImage: `url(${item.image})` }}
|
||||
>
|
||||
<div className="w-full h-full bg-black opacity-0 group-hover:opacity-50 transition-all duration-200"></div>
|
||||
</div>
|
||||
<div>
|
||||
<div className="opacity-0 group-hover:opacity-100 absolute top-0 bottom-0 right-0 left-0 transition-all duration-200 px-2 flex items-center justify-center">
|
||||
<div className="text-center text-primary-dark">
|
||||
<span className="text-xl font-semibold">{item.title}</span>
|
||||
<p className="text-base font-medium">{item.description}</p>
|
||||
<div className="flex justify-center mt-2">
|
||||
<ActionButton Icon={IconBeaker} title="Demo" href={item.href} />
|
||||
{item.docsUrl && (
|
||||
<ActionButton
|
||||
Icon={IconDocs}
|
||||
title="Documentation"
|
||||
href={item.docsUrl}
|
||||
className="mx-5"
|
||||
/>
|
||||
)}
|
||||
{item.sourceUrl && (
|
||||
<ActionButton
|
||||
Icon={IconCode}
|
||||
title="Source code"
|
||||
href={item.sourceUrl}
|
||||
/>
|
||||
)}
|
||||
|
||||
{/* Maybe: Blog post */}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</a>
|
||||
);
|
||||
}
|
||||
@@ -1,6 +1,7 @@
|
||||
import clsx from 'clsx';
|
||||
import Highlight, { defaultProps } from 'prism-react-renderer';
|
||||
import { Fragment, useRef } from 'react';
|
||||
import ButtonLink from './ButtonLink';
|
||||
import NewsletterForm from './NewsletterForm';
|
||||
|
||||
const codeLanguage = 'javascript';
|
||||
@@ -32,7 +33,10 @@ export function Hero() {
|
||||
const el = useRef(null);
|
||||
|
||||
return (
|
||||
<div className="overflow-hidden -mb-32 mt-[-4.5rem] pb-32 pt-[4.5rem] lg:mt-[-4.75rem] lg:pt-[4.75rem]">
|
||||
<div
|
||||
className="overflow-hidden -mb-32 mt-[-4.5rem] pb-32 pt-[4.5rem] lg:mt-[-4.75rem] lg:pt-[4.75rem]"
|
||||
id="hero"
|
||||
>
|
||||
<div className="py-16 sm:px-2 lg:relative lg:py-20 lg:px-0">
|
||||
{/* Commented code on line 37, 39 and 113 will reenable the two columns hero */}
|
||||
{/* <div className="mx-auto grid max-w-2xl grid-cols-1 items-center gap-y-16 gap-x-8 px-4 lg:max-w-8xl lg:grid-cols-2 lg:px-8 xl:gap-x-16 xl:px-12"> */}
|
||||
@@ -45,12 +49,20 @@ export function Hero() {
|
||||
</h1>
|
||||
</div>
|
||||
<p className="mt-4 text-xl tracking-tight text-slate-400">
|
||||
Portal.JS is a framework for rapidly building rich data portal
|
||||
frontends using a modern frontend approach. It can be used to
|
||||
present a single dataset or build a full-scale data
|
||||
catalog/portal.
|
||||
Rapidly build rich data portals using a modern frontend framework.
|
||||
</p>
|
||||
<NewsletterForm />
|
||||
|
||||
<ButtonLink className="mt-8" href="/docs">
|
||||
Get started
|
||||
</ButtonLink>
|
||||
|
||||
<ButtonLink className="ml-3" href="#gallery" style="secondary">
|
||||
Gallery
|
||||
</ButtonLink>
|
||||
|
||||
<div className="md:max-w-md mx-auto">
|
||||
<NewsletterForm />
|
||||
</div>
|
||||
<p className="my-10 text-l tracking-wide">
|
||||
<span>A project of</span>
|
||||
<a
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import { siteConfig } from '@/config/siteConfig';
|
||||
import { NextSeo } from 'next-seo';
|
||||
import { useTheme } from 'next-themes';
|
||||
import Link from 'next/link';
|
||||
import { useCallback, useEffect, useState } from 'react';
|
||||
|
||||
@@ -58,6 +59,7 @@ export default function Layout({
|
||||
tableOfContents?;
|
||||
}) {
|
||||
// const { toc } = children.props;
|
||||
const { theme, setTheme } = useTheme();
|
||||
|
||||
const currentSection = useTableOfContents(tableOfContents);
|
||||
|
||||
@@ -87,17 +89,21 @@ export default function Layout({
|
||||
>
|
||||
Built by{' '}
|
||||
<img
|
||||
src="/datopian-logo.png"
|
||||
src={
|
||||
theme === 'dark'
|
||||
? '/images/datopian-light-logotype.svg'
|
||||
: '/images/datopian-dark-logotype.svg'
|
||||
}
|
||||
alt="Datopian Logo"
|
||||
className="h-6 ml-2"
|
||||
/>
|
||||
</a>
|
||||
</footer>
|
||||
{/** TABLE OF CONTENTS */}
|
||||
{tableOfContents.length > 0 && (siteConfig.tableOfContents) && (
|
||||
{tableOfContents.length > 0 && siteConfig.tableOfContents && (
|
||||
<div className="hidden xl:fixed xl:right-0 xl:top-[4.5rem] xl:block xl:w-1/5 xl:h-[calc(100vh-4.5rem)] xl:flex-none xl:overflow-y-auto xl:py-16 xl:pr-6 xl:mb-16">
|
||||
<nav aria-labelledby="on-this-page-title" className="w-56">
|
||||
<h2 className="font-display text-md font-medium text-slate-900 dark:text-white">
|
||||
<h2 className="font-display text-md font-medium text-primary dark:text-primary-dark">
|
||||
On this page
|
||||
</h2>
|
||||
<ol className="mt-4 space-y-3 text-sm">
|
||||
@@ -108,7 +114,7 @@ export default function Layout({
|
||||
href={`#${section.id}`}
|
||||
className={
|
||||
isActive(section)
|
||||
? 'text-sky-500'
|
||||
? 'text-secondary'
|
||||
: 'font-normal text-slate-500 hover:text-slate-700 dark:text-slate-400 dark:hover:text-slate-300'
|
||||
}
|
||||
>
|
||||
|
||||
@@ -11,7 +11,7 @@ export default function MDXPage({ source, frontMatter }) {
|
||||
};
|
||||
|
||||
return (
|
||||
<div className="prose mx-auto prose-a:text-primary dark:prose-a:text-primary-dark prose-strong:text-primary dark:prose-strong:text-primary-dark prose-code:text-primary dark:prose-code:text-primary-dark prose-headings:text-primary dark:prose-headings:text-primary-dark prose text-primary dark:text-primary-dark prose-headings:font-headings dark:prose-invert prose-a:break-words">
|
||||
<div className="prose mx-auto prose-a:text-primary dark:prose-a:text-primary-dark prose-strong:text-primary dark:prose-strong:text-primary-dark prose-headings:text-primary dark:prose-headings:text-primary-dark text-primary dark:text-primary-dark prose-headings:font-headings dark:prose-invert prose-a:break-words">
|
||||
<header>
|
||||
<div className="mb-6">
|
||||
{/* Default layout */}
|
||||
|
||||
@@ -29,12 +29,12 @@ export default function NavItem({ item }) {
|
||||
{Object.prototype.hasOwnProperty.call(item, "href") ? (
|
||||
<Link
|
||||
href={item.href}
|
||||
className="text-slate-500 inline-flex items-center mr-2 px-1 pt-1 text-sm font-medium hover:text-slate-600"
|
||||
className="text-slate-600 dark:text-slate-400 inline-flex items-center mr-2 px-1 pt-1 text-sm font-medium hover:text-slate-500"
|
||||
>
|
||||
{item.name}
|
||||
</Link>
|
||||
) : (
|
||||
<div className="text-slate-500 inline-flex items-center mr-2 px-1 pt-1 text-sm font-medium hover:text-slate-600 fill-slate-500 hover:fill-slate-600">
|
||||
<div className="text-slate-600 dark:text-slate-400 inline-flex items-center mr-2 px-1 pt-1 text-sm font-medium hover:text-slate-500 fill-slate-500 hover:fill-slate-600">
|
||||
{item.name}
|
||||
</div>
|
||||
)}
|
||||
|
||||
@@ -22,24 +22,7 @@ export default function NewsletterForm() {
|
||||
data-type="subscription"
|
||||
className="mt-3 sm:flex"
|
||||
>
|
||||
<div className="sib-input sib-form-block !p-0 block w-full sm:flex-auto sm:w-32">
|
||||
<div className="form__entry entry_block w-full">
|
||||
<label htmlFor="name" className="sr-only entry__label">
|
||||
Name
|
||||
</label>
|
||||
<input
|
||||
id="NAME"
|
||||
name="NAME"
|
||||
type="text"
|
||||
required
|
||||
placeholder="Your name"
|
||||
className="input entry__field !w-full px-2 py-3 text-base rounded-md bg-slate-200 dark:bg-slate-800 placeholder-gray-500 focus:outline-none focus:ring-2 focus:ring-offset-2 focus:ring-blue-400 focus:ring-offset-gray-900"
|
||||
/>
|
||||
|
||||
<label className="entry__error entry__error--primary text-red-400 text-sm"></label>
|
||||
</div>
|
||||
</div>
|
||||
<div className="sib-input sib-form-block !p-0 block w-full sm:flex-auto sm:w-64 mt-3 sm:mt-0 sm:ml-3">
|
||||
<div className="sib-input sib-form-block !p-0 block w-full sm:flex-auto sm:w-64 mt-3 sm:mt-0">
|
||||
<div className="form__entry entry_block w-full">
|
||||
<label htmlFor="email" className="sr-only entry__label">
|
||||
Email address
|
||||
@@ -68,7 +51,7 @@ export default function NewsletterForm() {
|
||||
>
|
||||
<path d="M460.116 373.846l-20.823-12.022c-5.541-3.199-7.54-10.159-4.663-15.874 30.137-59.886 28.343-131.652-5.386-189.946-33.641-58.394-94.896-95.833-161.827-99.676C261.028 55.961 256 50.751 256 44.352V20.309c0-6.904 5.808-12.337 12.703-11.982 83.556 4.306 160.163 50.864 202.11 123.677 42.063 72.696 44.079 162.316 6.031 236.832-3.14 6.148-10.75 8.461-16.728 5.01z" />
|
||||
</svg>
|
||||
Notify me
|
||||
Notify Me
|
||||
</button>
|
||||
<input
|
||||
type="text"
|
||||
@@ -79,10 +62,7 @@ export default function NewsletterForm() {
|
||||
<input type="hidden" name="locale" value="en" />
|
||||
</form>
|
||||
</div>
|
||||
<div
|
||||
id="error-message"
|
||||
className="sib-form-message-panel !border-none"
|
||||
>
|
||||
<div id="error-message" className="sib-form-message-panel !border-none">
|
||||
<div className="sib-form-message-panel__text sib-form-message-panel__text--center !text-red-400 justify-center">
|
||||
<svg
|
||||
viewBox="0 0 512 512"
|
||||
|
||||
14
site/components/icons/DiscordIcon.tsx
Normal file
14
site/components/icons/DiscordIcon.tsx
Normal file
@@ -0,0 +1,14 @@
|
||||
export default function DiscordIcon(props) {
|
||||
return (
|
||||
<svg
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
width="16"
|
||||
height="16"
|
||||
fill="currentColor"
|
||||
viewBox="0 0 16 16"
|
||||
{...props}
|
||||
>
|
||||
<path d="M13.545 2.907a13.227 13.227 0 0 0-3.257-1.011.05.05 0 0 0-.052.025c-.141.25-.297.577-.406.833a12.19 12.19 0 0 0-3.658 0 8.258 8.258 0 0 0-.412-.833.051.051 0 0 0-.052-.025c-1.125.194-2.22.534-3.257 1.011a.041.041 0 0 0-.021.018C.356 6.024-.213 9.047.066 12.032c.001.014.01.028.021.037a13.276 13.276 0 0 0 3.995 2.02.05.05 0 0 0 .056-.019c.308-.42.582-.863.818-1.329a.05.05 0 0 0-.01-.059.051.051 0 0 0-.018-.011 8.875 8.875 0 0 1-1.248-.595.05.05 0 0 1-.02-.066.051.051 0 0 1 .015-.019c.084-.063.168-.129.248-.195a.05.05 0 0 1 .051-.007c2.619 1.196 5.454 1.196 8.041 0a.052.052 0 0 1 .053.007c.08.066.164.132.248.195a.051.051 0 0 1-.004.085 8.254 8.254 0 0 1-1.249.594.05.05 0 0 0-.03.03.052.052 0 0 0 .003.041c.24.465.515.909.817 1.329a.05.05 0 0 0 .056.019 13.235 13.235 0 0 0 4.001-2.02.049.049 0 0 0 .021-.037c.334-3.451-.559-6.449-2.366-9.106a.034.034 0 0 0-.02-.019Zm-8.198 7.307c-.789 0-1.438-.724-1.438-1.612 0-.889.637-1.613 1.438-1.613.807 0 1.45.73 1.438 1.613 0 .888-.637 1.612-1.438 1.612Zm5.316 0c-.788 0-1.438-.724-1.438-1.612 0-.889.637-1.613 1.438-1.613.807 0 1.451.73 1.438 1.613 0 .888-.631 1.612-1.438 1.612Z" />
|
||||
</svg>
|
||||
);
|
||||
}
|
||||
14
site/components/icons/EmailIcon.tsx
Normal file
14
site/components/icons/EmailIcon.tsx
Normal file
@@ -0,0 +1,14 @@
|
||||
export default function EmailIcon(props) {
|
||||
return (
|
||||
<svg
|
||||
fill="currentColor"
|
||||
viewBox="0 0 2150 2150"
|
||||
xmlns="http://www.w3.org/2000/svg"
|
||||
{...props}
|
||||
>
|
||||
<path
|
||||
d="M1920 428.266v1189.54l-464.16-580.146-88.203 70.585 468.679 585.904H83.684l468.679-585.904-88.202-70.585L0 1617.805V428.265l959.944 832.441L1920 428.266ZM1919.932 226v52.627l-959.943 832.44L.045 278.628V226h1919.887Z"
|
||||
/>
|
||||
</svg>
|
||||
);
|
||||
}
|
||||
7
site/components/icons/GitHubIcon.tsx
Normal file
7
site/components/icons/GitHubIcon.tsx
Normal file
@@ -0,0 +1,7 @@
|
||||
export default function GitHubIcon(props) {
|
||||
return (
|
||||
<svg aria-hidden="true" viewBox="0 0 16 16" fill="currentColor" {...props}>
|
||||
<path d="M8 0C3.58 0 0 3.58 0 8C0 11.54 2.29 14.53 5.47 15.59C5.87 15.66 6.02 15.42 6.02 15.21C6.02 15.02 6.01 14.39 6.01 13.72C4 14.09 3.48 13.23 3.32 12.78C3.23 12.55 2.84 11.84 2.5 11.65C2.22 11.5 1.82 11.13 2.49 11.12C3.12 11.11 3.57 11.7 3.72 11.94C4.44 13.15 5.59 12.81 6.05 12.6C6.12 12.08 6.33 11.73 6.56 11.53C4.78 11.33 2.92 10.64 2.92 7.58C2.92 6.71 3.23 5.99 3.74 5.43C3.66 5.23 3.38 4.41 3.82 3.31C3.82 3.31 4.49 3.1 6.02 4.13C6.66 3.95 7.34 3.86 8.02 3.86C8.7 3.86 9.38 3.95 10.02 4.13C11.55 3.09 12.22 3.31 12.22 3.31C12.66 4.41 12.38 5.23 12.3 5.43C12.81 5.99 13.12 6.7 13.12 7.58C13.12 10.65 11.25 11.33 9.47 11.53C9.76 11.78 10.01 12.26 10.01 13.01C10.01 14.08 10 14.94 10 15.21C10 15.42 10.15 15.67 10.55 15.59C13.71 14.53 16 11.53 16 8C16 3.58 12.42 0 8 0Z" />
|
||||
</svg>
|
||||
);
|
||||
}
|
||||
BIN
site/content/assets/examples/frictionless-dataset-demo.gif
Normal file
BIN
site/content/assets/examples/frictionless-dataset-demo.gif
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 8.6 MiB |
@@ -1,14 +1,20 @@
|
||||
Live DEMOs:
|
||||
---
|
||||
title: "PortalJS example 1: Create a full-featured custom data portal frontend for CKAN with PortalJS"
|
||||
authors: ['Luccas Mateus']
|
||||
date: 2021-04-20
|
||||
---
|
||||
|
||||
- https://catalog-portal-js.vercel.app
|
||||
- https://ckan-enterprise-frontend.vercel.app/
|
||||
We have created a full data portal demo using PortalJS all backed by a CKAN instance storing data and metadata, you can see below a screenshot of the homepage and of an individual dataset page.
|
||||
|
||||

|
||||

|
||||
|
||||
## Create a Portal app for CKAN
|
||||
|
||||
To create a Portal app, run the following command in your terminal:
|
||||
|
||||
```console
|
||||
npx create-next-app -e https://github.com/datopian/portal.js/tree/main/examples/ckan
|
||||
npx create-next-app -e https://github.com/datopian/portaljs/tree/main/examples/ckan
|
||||
```
|
||||
|
||||
> NB: Under the hood, this uses the tool called create-next-app, which bootstraps an app for you based on our CKAN example.
|
||||
@@ -69,7 +75,7 @@ For development/debugging purposes, we suggest installing the Chrome extension -
|
||||
|
||||
### I18n configuration
|
||||
|
||||
Portal.js is configured by default to support both `English` and `French` subpath for language translation. But for subsequent users, this following steps can be used to configure i18n for other languages;
|
||||
PortalJS is configured by default to support both `English` and `French` subpath for language translation. But for subsequent users, this following steps can be used to configure i18n for other languages;
|
||||
|
||||
1. Update `next.config.js`, to add more languages to the i18n locales
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
const config = {
|
||||
title:
|
||||
"Portal.JS",
|
||||
"PortalJS",
|
||||
description:
|
||||
"Portal.JS is a framework for rapidly building rich data portal frontends using a modern frontend approach. portal.js can be used to present a single dataset or build a full-scale data catalog/portal.",
|
||||
"PortalJS is a framework for rapidly building rich data portal frontends using a modern frontend approach. PortalJS can be used to present a single dataset or build a full-scale data catalog/portal.",
|
||||
theme: {
|
||||
default: "dark",
|
||||
toggleIcon: "/images/theme-button.svg",
|
||||
@@ -12,34 +12,33 @@ const config = {
|
||||
authorUrl: "https://datopian.com/",
|
||||
navbarTitle: {
|
||||
// logo: "/images/logo.svg",
|
||||
text: "🌀 Portal.JS",
|
||||
text: "🌀 PortalJS",
|
||||
// version: "Alpha",
|
||||
},
|
||||
navLinks: [
|
||||
{ name: "Docs", href: "/docs" },
|
||||
{ name: "Components", href: "/docs/components" },
|
||||
{ name: "Learn", href: "/learn" },
|
||||
{ name: "Gallery", href: "/gallery" },
|
||||
{ name: "Data Literate", href: "/data-literate" },
|
||||
{ name: "DL Demo", href: "/data-literate/demo" },
|
||||
{ name: "Excel Viewer", href: "/excel-viewer" },
|
||||
{ name: "GitHub", href: "https://github.com/datopian/portal.js" },
|
||||
// { name: "Components", href: "/docs/components" },
|
||||
{ name: "Blog", href: "/blog" },
|
||||
// { name: "Gallery", href: "/gallery" },
|
||||
// { name: "Data Literate", href: "/data-literate" },
|
||||
// { name: "DL Demo", href: "/data-literate/demo" },
|
||||
// { name: "Excel Viewer", href: "/excel-viewer" },
|
||||
],
|
||||
footerLinks: [],
|
||||
nextSeo: {
|
||||
openGraph: {
|
||||
type: "website",
|
||||
title:
|
||||
"Portal.JS - Rapidly build rich data portals using a modern frontend framework",
|
||||
"PortalJS - rapidly build rich data portals using a modern frontend framework.",
|
||||
description:
|
||||
"Portal.JS is a framework for rapidly building rich data portal frontends using a modern frontend approach. portal.js can be used to present a single dataset or build a full-scale data catalog/portal.",
|
||||
"PortalJS is a framework for rapidly building rich data portal frontends using a modern frontend approach. PortalJS can be used to present a single dataset or build a full-scale data catalog and portal.",
|
||||
locale: "en_US",
|
||||
images: [
|
||||
{
|
||||
url: "https://datahub.io/static/img/opendata/product.png", // TODO
|
||||
alt: "Portal.JS - Rapidly build rich data portals using a modern frontend framework",
|
||||
width: 1200,
|
||||
height: 627,
|
||||
url: "/homepage-screenshot.png", // TODO
|
||||
alt: "PortalJS - rapidly build rich data portals using a modern frontend framework.",
|
||||
width: 1280,
|
||||
height: 720,
|
||||
type: "image/jpg",
|
||||
},
|
||||
],
|
||||
@@ -51,7 +50,7 @@ const config = {
|
||||
},
|
||||
},
|
||||
github: "https://github.com/datopian/portaljs",
|
||||
discord: "https://discord.gg/An7Bu5x8",
|
||||
discord: "https://discord.gg/EeyfGrGu4U",
|
||||
tableOfContents: true,
|
||||
// analytics: "xxxxxx",
|
||||
// editLinkShow: true,
|
||||
|
||||
@@ -13,7 +13,7 @@ You can see the raw source of this page here: https://raw.githubusercontent.com/
|
||||
We can have github-flavored markdown including markdown tables, auto-linked links and checklists:
|
||||
|
||||
```
|
||||
https://github.com/datopian/portal.js
|
||||
https://github.com/datopian/portaljs
|
||||
|
||||
| a | b |
|
||||
|---|---|
|
||||
@@ -23,7 +23,7 @@ https://github.com/datopian/portal.js
|
||||
* [ ] a second thing to do
|
||||
```
|
||||
|
||||
https://github.com/datopian/portal.js
|
||||
https://github.com/datopian/portaljs
|
||||
|
||||
| a | b |
|
||||
|---|---|
|
||||
|
||||
@@ -1,132 +0,0 @@
|
||||
# 🌀 Portal.JS: The JavaScript framework for data portals
|
||||
|
||||
🌀 `portal.js` is a framework for rapidly building rich data portal frontends using a modern frontend approach. `portal.js` can be used to present a single dataset or build a full-scale data catalog/portal.
|
||||
|
||||
`portal.js` is built in Javascript and React on top of the popular [Next.js](https://nextjs.com/) framework. `portal` assumes a "decoupled" approach where the frontend is a separate service from the backend and interacts with backend(s) via an API. It can be used with any backend and has out of the box support for [CKAN](https://ckan.org/).
|
||||
|
||||
## Features
|
||||
|
||||
- 🗺️ Unified sites: present data and content in one seamless site, pulling datasets from a DMS (e.g. CKAN) and content from a CMS (e.g. wordpress) with a common internal API.
|
||||
- 👩💻 Developer friendly: built with familiar frontend tech Javascript, React etc
|
||||
- 🔋 Batteries included: Full set of portal components out of the box e.g. catalog search, dataset showcase, blog etc.
|
||||
- 🎨 Easy to theme and customize: installable themes, use standard CSS and React+CSS tooling. Add new routes quickly.
|
||||
- 🧱 Extensible: quickly extend and develop/import your own React components
|
||||
- 📝 Well documented: full set of documentation plus the documentation of NextJS and Apollo.
|
||||
|
||||
### For developers
|
||||
|
||||
- 🏗 Build with modern, familiar frontend tech such as Javascript and React.
|
||||
- 🚀 NextJS framework: so everything in NextJS for free React, SSR, static site generation, huge number of examples and integrations etc.
|
||||
- SSR => unlimited number of pages, SEO etc whilst still using React.
|
||||
- Static Site Generation (SSG) (good for small sites) => ultra-simple deployment, great performance and lighthouse scores etc
|
||||
|
||||
## Installation and setup
|
||||
|
||||
Before installation, ensure your system satisfies the following requirements:
|
||||
|
||||
- Node.js 10.13 or later
|
||||
- Nextjs 10.0.3
|
||||
- MacOS, Windows (including WSL), and Linux are supported
|
||||
|
||||
> Note: We also recommend instead of npm using `yarn` instead of `npm`.
|
||||
>
|
||||
Portal.js is built with React on top of Nextjs framework, so for a quick setup, you can bootstrap a Nextjs app and install portal.js as demonstrated in the code below:
|
||||
|
||||
```bash=
|
||||
## Create a react app
|
||||
npx create-next-app
|
||||
# or
|
||||
yarn create next-app
|
||||
```
|
||||
After the installation is complete, follow the instructions to start the development server. Try editing pages/index.js and see the result on your browser.
|
||||
|
||||
> For more information on how to use create-next-app, you can review the [create-next-app](https://nextjs.org/docs/api-reference/create-next-app) documentation.
|
||||
|
||||
Once you have Nextjs created, you can install portal.js:
|
||||
|
||||
```bash=
|
||||
yarn add https://github.com/datopian/portal.js.git
|
||||
```
|
||||
|
||||
You're now ready to use portal.js in your next app. To test portal.js, open your `index.js` file in the pages folder. By default you should have some autogenerated code in the `index.js` file:
|
||||
|
||||
|
||||
Which outputs a page with the following content:
|
||||
|
||||

|
||||
|
||||
Now, we are going to do some clean up and add a table component. In the `index.js` file, import a [Table]() component from portal as shown below:
|
||||
|
||||
```javascript
|
||||
import Head from 'next/head'
|
||||
import { Table } from 'portal' //import Table component
|
||||
import styles from '../styles/Home.module.css'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const columns = [
|
||||
{ field: 'id', headerName: 'ID' },
|
||||
{ field: 'firstName', headerName: 'First name' },
|
||||
{ field: 'lastName', headerName: 'Last name' },
|
||||
{ field: 'age', headerName: 'Age' }
|
||||
];
|
||||
|
||||
const rows = [
|
||||
{ id: 1, lastName: 'Snow', firstName: 'Jon', age: 35 },
|
||||
{ id: 2, lastName: 'Lannister', firstName: 'Cersei', age: 42 },
|
||||
{ id: 3, lastName: 'Lannister', firstName: 'Jaime', age: 45 },
|
||||
{ id: 4, lastName: 'Stark', firstName: 'Arya', age: 16 },
|
||||
{ id: 7, lastName: 'Clifford', firstName: 'Ferrara', age: 44 },
|
||||
{ id: 8, lastName: 'Frances', firstName: 'Rossini', age: 36 },
|
||||
{ id: 9, lastName: 'Roxie', firstName: 'Harvey', age: 65 },
|
||||
];
|
||||
|
||||
return (
|
||||
<div className={styles.container}>
|
||||
<Head>
|
||||
<title>Create Portal App</title>
|
||||
<link rel="icon" href="/favicon.ico" />
|
||||
</Head>
|
||||
|
||||
<h1 className={styles.title}>
|
||||
Welcome to <a href="https://nextjs.org">Portal.JS</a>
|
||||
</h1>
|
||||
|
||||
{/* Use table component */}
|
||||
<Table data={rows} columns={columns} />
|
||||
|
||||
</div>
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
Now, your page should look like the following:
|
||||
|
||||

|
||||
|
||||
> **Note**: You can learn more about individual portal components, as well as their prop types in the [components reference](/docs/components).
|
||||
|
||||
|
||||
## Next Steps
|
||||
|
||||
You can check out the following examples built with Portal.js.
|
||||
|
||||
* [A portal for a single Frictionless dataset](/learn/ckan)
|
||||
* [A portal with a CKAN backend](/learn/single-frictionless-dataset)
|
||||
|
||||
> The [`examples` directory](https://github.com/datopian/portal.js/tree/main/examples) is regularly updated with different portal examples.
|
||||
|
||||
You can also look at the full list of the available components that are provided by Portal.JS in [Components](/docs/components).
|
||||
|
||||
|
||||
## Reference Information
|
||||
|
||||
* [Full list of the available components that are provided by Portal.JS](/docs/components)
|
||||
* [Reference](/docs/references)
|
||||
|
||||
|
||||
## Getting Help
|
||||
|
||||
If you have questions about anything related to Portal.js, you're always welcome to ask our community on [GitHub Discussions](https://github.com/datopian/portal.js/discussions).
|
||||
|
||||
|
||||
@@ -1,589 +0,0 @@
|
||||
# Components Reference
|
||||
|
||||
Portal.js supports many components that can help you build amazing data portals similar to [this](https://catalog-portal-js.vercel.app/) and [this](https://portal-js.vercel.app/).
|
||||
|
||||
In this section, we'll cover all supported components in depth, and help you understand their use as well as the expected properties.
|
||||
|
||||
Components are grouped under the following sections:
|
||||
* [UI](https://github.com/datopian/portal.js/tree/main/src/components/ui): Components like Nav bar, Footer, e.t.c
|
||||
* [Dataset](https://github.com/datopian/portal.js/tree/main/src/components/dataset): Components used for displaying a Frictionless dataset and resources
|
||||
* [Search](https://github.com/datopian/portal.js/tree/main/src/components/search): Components used for building a search interface for datasets
|
||||
* [Blog](https://github.com/datopian/portal.js/tree/main/src/components/blog): Components for building a simple blog for datasets
|
||||
* [Views](https://github.com/datopian/portal.js/tree/main/src/components/views): Components like charts, tables, maps for generating data views
|
||||
* [Misc](https://github.com/datopian/portal.js/tree/main/src/components/misc): Miscellaneos components like errors, custom links, etc used for extra design.
|
||||
|
||||
### UI Components
|
||||
|
||||
In the UI we group all components that can be used for building generic page sections. These are components for building sections like the Navigation bar, Footer, Side pane, Recent datasets, e.t.c.
|
||||
|
||||
#### [Nav Component](https://github.com/datopian/portal.js/blob/main/src/components/ui/Nav.js)
|
||||
|
||||
To build a navigation bar, you can use the `Nav` component as demonstrated below:
|
||||
|
||||
```javascript
|
||||
import { Nav } from 'portal'
|
||||
|
||||
export default function Home(){
|
||||
|
||||
const navMenu = [{ title: 'Blog', path: '/blog' },
|
||||
{ title: 'Search', path: '/search' }]
|
||||
|
||||
return (
|
||||
<>
|
||||
<Nav logo="/images/logo.png" navMenu={navMenu}/>
|
||||
...
|
||||
</>
|
||||
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Nav Component Prop Types
|
||||
|
||||
Nav component accepts two properties:
|
||||
* **logo**: A string to an image path. Can be relative or absolute.
|
||||
* **navMenu**: An array of objects with title and path. E.g : {"[{ title: 'Blog', path: '/blog' },{ title: 'Search', path: '/search' }]"}
|
||||
|
||||
|
||||
#### [Recent Component](https://github.com/datopian/portal.js/blob/main/src/components/ui/Recent.js)
|
||||
|
||||
The `Recent` component is used to display a list of recent [datasets](#Dataset) in the home page. This useful if you want to display the most recent dataset users have interacted with in your home page.
|
||||
To build a recent dataset section, you can use the `Recent` component as demonstrated below:
|
||||
|
||||
```javascript
|
||||
import { Recent } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const datasets = [
|
||||
{
|
||||
organization: {
|
||||
name: "Org1",
|
||||
title: "This is the first org",
|
||||
description: "A description of the organization 1"
|
||||
},
|
||||
title: "Data package title",
|
||||
name: "dataset1",
|
||||
description: "description of data package",
|
||||
resources: [],
|
||||
},
|
||||
{
|
||||
organization: {
|
||||
name: "Org2",
|
||||
title: "This is the second org",
|
||||
description: "A description of the organization 2"
|
||||
},
|
||||
title: "Data package title",
|
||||
name: "dataset2",
|
||||
description: "description of data package",
|
||||
resources: [],
|
||||
},
|
||||
]
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Use Recent component */}
|
||||
<Recent datasets={datasets} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
Note: The `Recent` component is hyperlinked with the dataset name of the organization and the dataset name in the following format:
|
||||
|
||||
> `/@<org name>/<dataset name>`
|
||||
|
||||
For instance, using the example dataset above, the first component will be link to page:
|
||||
|
||||
> `/@org1/dataset1`
|
||||
|
||||
and the second will be linked to:
|
||||
|
||||
> `/@org2/dataset2`
|
||||
|
||||
This is useful to know when generating dynamic pages for each dataset.
|
||||
|
||||
#### Recent Component Prop Types
|
||||
|
||||
The `Recent` component accepts the following properties:
|
||||
* **datasets**: An array of [datasets](#Dataset)
|
||||
|
||||
### Dataset Components
|
||||
|
||||
The dataset component groups together components that can be used for building a dataset UI. These includes components for displaying info about a dataset, resources in a dataset as well as dataset ReadMe.
|
||||
|
||||
#### [KeyInfo Component](https://github.com/datopian/portal.js/blob/main/src/components/dataset/KeyInfo.js)
|
||||
|
||||
The `KeyInfo` components displays key properties like the number of resources, size, format, licences of in a dataset in tabular form. See example in the `Key Info` section [here](https://portal-js.vercel.app/). To use it, you can import the `KeyInfo` component as demonstrated below:
|
||||
```javascript
|
||||
import { KeyInfo } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const datapackage = {
|
||||
"name": "finance-vix",
|
||||
"title": "VIX - CBOE Volatility Index",
|
||||
"homepage": "http://www.cboe.com/micro/VIX/",
|
||||
"version": "0.1.0",
|
||||
"license": "PDDL-1.0",
|
||||
"sources": [
|
||||
{
|
||||
"title": "CBOE VIX Page",
|
||||
"name": "CBOE VIX Page",
|
||||
"web": "http://www.cboe.com/micro/vix/historical.aspx"
|
||||
}
|
||||
],
|
||||
"resources": [
|
||||
{
|
||||
"name": "vix-daily",
|
||||
"path": "vix-daily.csv",
|
||||
"format": "csv",
|
||||
"size": 20982,
|
||||
"mediatype": "text/csv",
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Use KeyInfo component */}
|
||||
<KeyInfo descriptor={datapackage} resources={datapackage.resources} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
#### KeyInfo Component Prop Types
|
||||
|
||||
KeyInfo component accepts two properties:
|
||||
* **descriptor**: A [Frictionless data package descriptor](https://specs.frictionlessdata.io/data-package/#descriptor)
|
||||
* **resources**: An [Frictionless data package resource](https://specs.frictionlessdata.io/data-resource/#introduction)
|
||||
|
||||
|
||||
#### [ResourceInfo Component](https://github.com/datopian/portal.js/blob/main/src/components/dataset/ResourceInfo.js)
|
||||
|
||||
The `ResourceInfo` components displays key properties like the name, size, format, modification dates, as well as a download link in a resource object. See an example of a `ResourceInfo` component in the `Data Files` section [here](https://portal-js.vercel.app/).
|
||||
|
||||
You can import and use the`ResourceInfo` component as demonstrated below:
|
||||
```javascript
|
||||
import { ResourceInfo } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const resources = [
|
||||
{
|
||||
"name": "vix-daily",
|
||||
"path": "vix-daily.csv",
|
||||
"format": "csv",
|
||||
"size": 20982,
|
||||
"mediatype": "text/csv",
|
||||
},
|
||||
{
|
||||
"name": "vix-daily 2",
|
||||
"path": "vix-daily2.csv",
|
||||
"format": "csv",
|
||||
"size": 2082,
|
||||
"mediatype": "text/csv",
|
||||
}
|
||||
]
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Use Recent component */}
|
||||
<ResourceInfo resources={resources} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### ResourceInfo Component Prop Types
|
||||
|
||||
ResourceInfo component accepts a single property:
|
||||
* **resources**: An [Frictionless data package resource](https://specs.frictionlessdata.io/data-resource/#introduction)
|
||||
|
||||
|
||||
#### [ReadMe Component](https://github.com/datopian/portal.js/blob/main/src/components/dataset/Readme.js)
|
||||
|
||||
The `ReadMe` component is used for displaying a compiled dataset Readme in a readable format. See example in the `README` section [here](https://portal-js.vercel.app/).
|
||||
|
||||
> Note: By compiled ReadMe, we mean ReadMe that has been converted to plain string using a package like [remark](https://www.npmjs.com/package/remark).
|
||||
|
||||
You can import and use the`ReadMe` component as demonstrated below:
|
||||
```javascript
|
||||
import { ReadMe } from 'portal'
|
||||
import remark from 'remark'
|
||||
import html from 'remark-html'
|
||||
import { useEffect, useState } from 'react'
|
||||
|
||||
|
||||
const readMeMarkdown = `
|
||||
CBOE Volatility Index (VIX) time-series dataset including daily open, close,
|
||||
high and low. The CBOE Volatility Index (VIX) is a key measure of market
|
||||
expectations of near-term volatility conveyed by S&P 500 stock index option
|
||||
prices introduced in 1993.
|
||||
|
||||
## Data
|
||||
|
||||
From the [VIX FAQ][faq]:
|
||||
|
||||
> In 1993, the Chicago Board Options Exchange® (CBOE®) introduced the CBOE
|
||||
> Volatility Index®, VIX®, and it quickly became the benchmark for stock market
|
||||
> volatility. It is widely followed and has been cited in hundreds of news
|
||||
> articles in the Wall Street Journal, Barron's and other leading financial
|
||||
> publications. Since volatility often signifies financial turmoil, VIX is
|
||||
> often referred to as the "investor fear gauge".
|
||||
|
||||
[faq]: http://www.cboe.com/micro/vix/faq.aspx
|
||||
|
||||
## License
|
||||
|
||||
No obvious statement on [historical data page][historical]. Given size and
|
||||
factual nature of the data and its source from a US company would imagine this
|
||||
was public domain and as such have licensed the Data Package under the Public
|
||||
Domain Dedication and License (PDDL).
|
||||
|
||||
[historical]: http://www.cboe.com/micro/vix/historical.aspx
|
||||
`
|
||||
|
||||
export default function Home() {
|
||||
const [readMe, setreadMe] = useState("")
|
||||
|
||||
useEffect(() => {
|
||||
async function processReadMe() {
|
||||
const processed = await remark()
|
||||
.use(html)
|
||||
.process(readMeMarkdown)
|
||||
setreadMe(processed.toString())
|
||||
}
|
||||
processReadMe()
|
||||
}, [])
|
||||
|
||||
return (
|
||||
<div>
|
||||
<ReadMe readme={readMe} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
#### ReadMe Component Prop Types
|
||||
|
||||
The `ReadMe` component accepts a single property:
|
||||
* **readme**: A string of a compiled ReadMe in html format.
|
||||
|
||||
### [View Components](https://github.com/datopian/portal.js/tree/main/src/components/views)
|
||||
|
||||
View components is a set of components that can be used for displaying dataset views like charts, tables, maps, e.t.c.
|
||||
|
||||
#### [Chart Component](https://github.com/datopian/portal.js/blob/main/src/components/views/Chart.js)
|
||||
|
||||
The `Chart` components exposes different chart components like Plotly Chart, Vega charts, which can be used for showing graphs. See example in the `Graph` section [here](https://portal-js.vercel.app/).
|
||||
To use a chart component, you need to compile and pass a view spec as props to the chart component.
|
||||
Each Chart type have their specific spec, as explained in this [doc](https://specs.frictionlessdata.io/views/#graph-spec).
|
||||
|
||||
In the example below, we assume there's a compiled Plotly spec:
|
||||
|
||||
```javascript
|
||||
import { PlotlyChart } from 'portal'
|
||||
|
||||
export default function Home({plotlySpec}) {
|
||||
|
||||
return (
|
||||
< div >
|
||||
<PlotlyChart spec={plotlySpec} />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
> Note: You can compile views using the [datapackage-render](https://github.com/datopian/datapackage-views-js) library, as demonstrated in [this example](https://github.com/datopian/portal.js/blob/main/examples/dataset-frictionless/lib/utils.js).
|
||||
|
||||
|
||||
#### Chart Component Prop Types
|
||||
|
||||
KeyInfo component accepts two properties:
|
||||
* **spec**: A compiled view spec depending on the chart type.
|
||||
|
||||
#### [Table Component](https://github.com/datopian/portal.js/blob/main/examples/dataset-frictionless/components/Table.js)
|
||||
|
||||
The `Table` component is used for displaying dataset resources as a tabular grid. See example in the `Data Preview` section [here](https://portal-js.vercel.app/).
|
||||
To use a Table component, you have to pass an array of data and columns as demonstrated below:
|
||||
|
||||
```javascript
|
||||
import { Table } from 'portal' //import Table component
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const columns = [
|
||||
{ field: 'id', headerName: 'ID' },
|
||||
{ field: 'firstName', headerName: 'First name' },
|
||||
{ field: 'lastName', headerName: 'Last name' },
|
||||
{ field: 'age', headerName: 'Age' }
|
||||
];
|
||||
|
||||
const data = [
|
||||
{ id: 1, lastName: 'Snow', firstName: 'Jon', age: 35 },
|
||||
{ id: 2, lastName: 'Lannister', firstName: 'Cersei', age: 42 },
|
||||
{ id: 3, lastName: 'Lannister', firstName: 'Jaime', age: 45 },
|
||||
{ id: 4, lastName: 'Stark', firstName: 'Arya', age: 16 },
|
||||
{ id: 7, lastName: 'Clifford', firstName: 'Ferrara', age: 44 },
|
||||
{ id: 8, lastName: 'Frances', firstName: 'Rossini', age: 36 },
|
||||
{ id: 9, lastName: 'Roxie', firstName: 'Harvey', age: 65 },
|
||||
];
|
||||
|
||||
return (
|
||||
<Table data={data} columns={columns} />
|
||||
)
|
||||
}
|
||||
|
||||
```
|
||||
> Note: Under the hood, Table component uses the [DataGrid Material UI table](https://material-ui.com/components/data-grid/), and as such all supported params in data and columns are supported.
|
||||
|
||||
|
||||
#### Table Component Prop Types
|
||||
|
||||
Table component accepts two properties:
|
||||
* **data**: An array of column names with properties: e.g {'[{field: "col1", headerName: "col1"}, {field: "col2", headerName: "col2"}]'}
|
||||
* **columns**: An array of data objects e.g. {'[ {col1: 1, col2: 2}, {col1: 5, col2: 7} ]'}
|
||||
|
||||
|
||||
### [Search Components](https://github.com/datopian/portal.js/tree/main/src/components/search)
|
||||
|
||||
Search components groups together components that can be used for creating a search interface. This includes search forms, search item as well as search result list.
|
||||
|
||||
#### [Form Component](https://github.com/datopian/portal.js/blob/main/src/components/search/Form.js)
|
||||
|
||||
The search`Form` component is a simple search input and submit button. See example of a search form [here](https://catalog-portal-js.vercel.app/search).
|
||||
|
||||
The search `form` requires a submit handler (`handleSubmit`). This handler function receives the search term, and handles actual search.
|
||||
|
||||
In the example below, we demonstrate how to use the `Form` component.
|
||||
|
||||
```javascript
|
||||
import { Form } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const handleSearchSubmit = (searchQuery) => {
|
||||
// Write your custom code to perform search in db
|
||||
console.log(searchQuery);
|
||||
}
|
||||
|
||||
return (
|
||||
<Form
|
||||
handleSubmit={handleSearchSubmit} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Form Component Prop Types
|
||||
|
||||
The `Form` component accepts a single property:
|
||||
* **handleSubmit**: A function that receives the search text, and can be customize to perform the actual search.
|
||||
|
||||
#### [Item Component](https://github.com/datopian/portal.js/blob/main/src/components/search/Item.js)
|
||||
|
||||
The search`Item` component can be used to display a single search result.
|
||||
|
||||
In the example below, we demonstrate how to use the `Item` component.
|
||||
|
||||
```javascript
|
||||
import { Item } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
const datapackage = {
|
||||
"name": "finance-vix",
|
||||
"title": "VIX - CBOE Volatility Index",
|
||||
"homepage": "http://www.cboe.com/micro/VIX/",
|
||||
"version": "0.1.0",
|
||||
"description": "This is a test organization description",
|
||||
"resources": [
|
||||
{
|
||||
"name": "vix-daily",
|
||||
"path": "vix-daily.csv",
|
||||
"format": "csv",
|
||||
"size": 20982,
|
||||
"mediatype": "text/csv",
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
return (
|
||||
<Item dataset={datapackage} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Item Component Prop Types
|
||||
|
||||
The `Item` component accepts a single property:
|
||||
* **dataset**: A [Frictionless data package descriptor](https://specs.frictionlessdata.io/data-package/#descriptor)
|
||||
|
||||
|
||||
#### [ItemTotal Component](https://github.com/datopian/portal.js/blob/main/src/components/search/Item.js)
|
||||
|
||||
The search`ItemTotal` is a simple component for displaying the total search result
|
||||
|
||||
In the example below, we demonstrate how to use the `ItemTotal` component.
|
||||
|
||||
```javascript
|
||||
import { ItemTotal } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
//do some custom search to get results
|
||||
const search = (text) => {
|
||||
return [{ name: "data1" }, { name: "data2" }]
|
||||
}
|
||||
//get the total result count
|
||||
const searchTotal = search("some text").length
|
||||
|
||||
return (
|
||||
<ItemTotal count={searchTotal} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### ItemTotal Component Prop Types
|
||||
|
||||
The `ItemTotal` component accepts a single property:
|
||||
* **count**: An integer of the total number of results.
|
||||
|
||||
|
||||
### [Blog Components](https://github.com/datopian/portal.js/tree/main/src/components/blog)
|
||||
|
||||
These are group of components for building a portal blog. See example of portal blog [here](https://catalog-portal-js.vercel.app/blog)
|
||||
|
||||
#### [PostList Components](https://github.com/datopian/portal.js/tree/main/src/components/misc)
|
||||
|
||||
The `PostList` component is used to display a list of blog posts with the title and a short excerpts from the content.
|
||||
|
||||
In the example below, we demonstrate how to use the `PostList` component.
|
||||
|
||||
```javascript
|
||||
import { PostList } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const posts = [
|
||||
{ title: "Blog post 1", excerpt: "This is the first blog excerpts in this list." },
|
||||
{ title: "Blog post 2", excerpt: "This is the second blog excerpts in this list." },
|
||||
{ title: "Blog post 3", excerpt: "This is the third blog excerpts in this list." },
|
||||
]
|
||||
return (
|
||||
<PostList posts={posts} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### PostList Component Prop Types
|
||||
|
||||
The `PostList` component accepts a single property:
|
||||
* **posts**: An array of post list objects with the following properties:
|
||||
```javascript
|
||||
[
|
||||
{
|
||||
title: "The title of the blog post",
|
||||
excerpt: "A short excerpt from the post content",
|
||||
},
|
||||
]
|
||||
```
|
||||
|
||||
#### [Post Components](https://github.com/datopian/portal.js/tree/main/src/components/misc)
|
||||
|
||||
The `Post` component is used to display a blog post. See an example of a blog post [here](https://catalog-portal-js.vercel.app/blog/nyt-pa-platformen-opdateringsfrekvens-og-andres-data)
|
||||
|
||||
In the example below, we demonstrate how to use the `Post` component.
|
||||
|
||||
```javascript
|
||||
import { Post } from 'portal'
|
||||
import * as dayjs from 'dayjs' //For converting UTC time to relative format
|
||||
import relativeTime from 'dayjs/plugin/relativeTime'
|
||||
|
||||
dayjs.extend(relativeTime)
|
||||
|
||||
export default function Home() {
|
||||
|
||||
const post = {
|
||||
title: "This is a sample blog post",
|
||||
content: `<h1>A simple header</h1>
|
||||
The PostList component is used to display a list of blog posts
|
||||
with the title and a short excerpts from the content.
|
||||
In the example below, we demonstrate how to use the PostList component.`,
|
||||
createdAt: dayjs().to(dayjs(1620649596902)),
|
||||
featuredImage: "https://pixabay.com/get/ge9a766d1f7b5fe0eccbf0f439501a2cf2b191997290e7ab15e6a402574acc2fdba48a82d278dca3547030e0202b7906d_640.jpg"
|
||||
}
|
||||
|
||||
return (
|
||||
<Post post={post} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Post Component Prop Types
|
||||
|
||||
The `Post` component accepts a single property:
|
||||
* **post**: An object with the following properties:
|
||||
```javascript
|
||||
{
|
||||
title: <The title of the blog post>
|
||||
content: <The body of the blog post. Can be plain text or html>
|
||||
createdAt: <The utc date when the post was last modified>
|
||||
featuredImage: < Url/relative url to post cover image>
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### [Misc Components](https://github.com/datopian/portal.js/tree/main/src/components/misc)
|
||||
|
||||
These are group of miscellaneous/extra components for extending your portal. They include components like Errors, custom links, etc.
|
||||
|
||||
#### [Error Component](https://github.com/datopian/portal.js/blob/main/src/components/misc/Error.js)
|
||||
|
||||
The `Error` component is used to display a custom error message.
|
||||
|
||||
In the example below, we demonstrate how to use the `Error` component.
|
||||
|
||||
```javascript
|
||||
import { Error } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
return (
|
||||
<Error message="An error occured when loading the file!" />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### Error Component Prop Types
|
||||
|
||||
The `Error` component accepts a single property:
|
||||
* **message**: A string with the error message to display.
|
||||
|
||||
|
||||
#### [Custom Component](https://github.com/datopian/portal.js/blob/main/src/components/misc/Error.js)
|
||||
|
||||
The `CustomLink` component is used to create a link with a consistent style to other portal components.
|
||||
|
||||
In the example below, we demonstrate how to use the `CustomLink` component.
|
||||
|
||||
```javascript
|
||||
import { CustomLink } from 'portal'
|
||||
|
||||
export default function Home() {
|
||||
|
||||
return (
|
||||
<CustomLink url="/blog" title="Goto Blog" />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
#### CustomLink Component Prop Types
|
||||
|
||||
The `CustomLink` component accepts the following properties:
|
||||
|
||||
* **url**: A string. The relative or absolute url of the link.
|
||||
* **title**: A string. The title of the link
|
||||
|
||||
249
site/content/docs/dms/authentication.md
Normal file
249
site/content/docs/dms/authentication.md
Normal file
@@ -0,0 +1,249 @@
|
||||
# Authentication
|
||||
|
||||
## Introduction
|
||||
|
||||
The core function of authentication is to **Identify** Users of the Portal (in a federated way) so we can base access on their identity.
|
||||
|
||||
There are 3 major conceptual components: Identity, Accounts and Sessions which come together in the following stages:
|
||||
|
||||
* **Root Identity Determination:** Determine Identity often via Delegation
|
||||
* **Sessions:** Persistence of the identity in the web application in a secure way (without new identity determination on each request! I don't want to have to login via third party service every time)
|
||||
* **Account (aka profile):** Storing Related Account/Profile Information in our application (not in third party identity) eg. email, name (other preferences)
|
||||
* This will get auto-created usually at first Identification
|
||||
* In limited case this can be seen as a cache of info from Identity system (e.g. your email)
|
||||
* However often richer info that is app specific that is generated (relevant for personalization)
|
||||
|
||||
### Root Identity Determination options :key:
|
||||
|
||||
The identity determination can be done in multiple ways. In this article we're considering following 3 options that we believe are widely used:
|
||||
|
||||
- Password authentication - traditional username and password pair
|
||||
- Single Sign-on (SSO) via protocols such as OAuth, SAML, OpenID Connect
|
||||
- One-time password (OTP) via email or SMS (aka passwordless connection)
|
||||
|
||||
#### Password authentication
|
||||
|
||||
Traditional way of authentication of users. When signing up user provides at least username and password pair which is then stored in a database for future authentication processes. Normally, additional information such as email address, full name etc. is also requested when registering.
|
||||
|
||||
Examples of password authentication in popular services:
|
||||
|
||||
- GitHub - https://github.com/join
|
||||
- GitLab - https://gitlab.com/users/sign_up
|
||||
- NPM - https://www.npmjs.com/signup
|
||||
|
||||
#### Single Sign-on (SSO)
|
||||
|
||||
The way of delegating identity determination process to some third-party service. Normally, popular social network services are used, e.g., Google, Facebook, Twitter etc. SSO implementations can be done using OAuth or SAML protocols. In addition, there is OpenID Connect protocol which is an extension of OAuth2.0.
|
||||
|
||||
- OAuth
|
||||
- JWT based
|
||||
- JSON based
|
||||
- 'webby'
|
||||
- SAML
|
||||
- XML based
|
||||
- SOAP based
|
||||
- 'enterprisey'
|
||||
|
||||
List of OAuth providers:
|
||||
|
||||
https://en.wikipedia.org/wiki/List_of_OAuth_providers
|
||||
|
||||
Examples of SSO in popular projects:
|
||||
|
||||
- https://datahub.io/login
|
||||
- https://vercel.com/signup
|
||||
|
||||
#### One-time password (OTP)
|
||||
|
||||
Also known as dynamic password, OTP also solves limitations of traditional password authentication method. Usually, the one time passwords are received via email or SMS.
|
||||
|
||||
### Account (aka profile)
|
||||
|
||||
- Storage of user profile information (email, fullname, gravatar etc.)
|
||||
- Retrieving user profile information via API
|
||||
- Updating profile
|
||||
- Deleting profile
|
||||
|
||||
### Sessions
|
||||
|
||||
- Log out: DePersisting the Session
|
||||
- Invalidating all Sessions: e.g. if a security issue
|
||||
- Sessions outside of browsers
|
||||
|
||||
## Key Job Stories
|
||||
|
||||
When a user signs in, I want to know her/his identity so that I can limit access and editing based on who she/he is.
|
||||
|
||||
When a user visits the data portal for the first time, I want to provide him/her a way to register easily/quickly so that more people uses the data portal.
|
||||
|
||||
When I visit the data portal for the first time, I want to sign up using my existing social network account so that I don't need to remember yet another credentials.
|
||||
|
||||
When I'm using the CLI app (or anything else outside browser), I want to be able to login so that I can work from the terminal (e.g., have write access: editing datasets etc.).
|
||||
|
||||
[More job stories](#more-job-stories).
|
||||
|
||||
## CKAN 2 (CKAN Classic)
|
||||
|
||||
### Basic CKAN authentication
|
||||
|
||||
In classic system, we have basic CKAN authentication. Below is how registration page looks like:
|
||||
|
||||

|
||||
|
||||
Registration flow in CKAN Classic:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
|
||||
user->>ckan: fill in the form and submit
|
||||
ckan->>ckan: check access (if user can create user)
|
||||
ckan->>ckan: parse params
|
||||
ckan->>ckan: check recaptcha
|
||||
ckan->>ckan: call 'user_create' action
|
||||
ckan->>ckan.model: add a new user into db
|
||||
ckan->>ckan: create an activity
|
||||
ckan->>ckan: log the user
|
||||
ckan->>user: redirect to dashboard
|
||||
```
|
||||
|
||||
We can extend basic CKAN authentication with:
|
||||
|
||||
- LDAP
|
||||
- https://extensions.ckan.org/extension/ldap/
|
||||
- https://github.com/NaturalHistoryMuseum/ckanext-ldap
|
||||
- OAuth - see below
|
||||
- SAML - https://extensions.ckan.org/extension/saml2/
|
||||
|
||||
### CKAN Classic as OAuth client
|
||||
|
||||
CKAN Classic can also be used as OAuth client:
|
||||
|
||||
- https://github.com/conwetlab/ckanext-oauth2 - this is the only one that's maintained.
|
||||
- https://github.com/etalab/ckanext-oauth2 - outdated, the one above is based on this.
|
||||
- https://github.com/okfn/ckanext-oauth - last commit 9 years ago.
|
||||
- https://github.com/ckan/ckanext-oauth2waad - Windows Azure Active Directory specific and outdated.
|
||||
|
||||
How it works:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
|
||||
user->>ckan: request for login via OAuth provider
|
||||
ckan->>ckan.oauth: raise 401 and call `challenge` function
|
||||
ckan.oauth->>user: redirect the user to the 3rd party log in page
|
||||
user->>3rdparty: perform login
|
||||
3rdparty->>ckan.oauth: redirect to /oauth2/callback with token
|
||||
ckan.oauth->>3rdparty: call `authenticate` with token
|
||||
3rdparty->>ckan.oauth: return user info
|
||||
ckan.oauth->>ckan: if doesn't exist save that info in db or update it
|
||||
ckan.oauth->>ckan.oauth: add cookies
|
||||
ckan.oauth->>user: redirect to dashboard
|
||||
```
|
||||
|
||||
## CKAN 3 (Next Gen)
|
||||
|
||||
We have considered some of popular and/or modern solutions for identity management that we can implement in CKAN 3:
|
||||
|
||||
https://docs.google.com/spreadsheets/d/1qXZyzAbA2NtpnoSZRJ2K_EbaWJnvxkrKVzQ_2rD5eQw/edit#gid=0
|
||||
|
||||
Shortlist based on scores from the spreadsheet above:
|
||||
|
||||
- Auth0
|
||||
- AuthN
|
||||
- Ory/Kratos
|
||||
|
||||
Recommendation:
|
||||
|
||||
All projects from the shortlist can be considered for a project. It worth to give a try for each of them and find out what works best for your project's needs. Testing out Auth0 should be straightforward and take less than an hour. AuthN and Ory/Kratos would require to build docker images and to run it locally but overall it should not be time consuming.
|
||||
|
||||
### Existing work
|
||||
|
||||
In datahub.io we have implemented SSO via Google/Github. Below is sequence diagram showing the auth flow with datopian/auth + frontend express app (similar to CKAN 3 frontend):
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
|
||||
frontend.login->>auth.authenticate: authenticate(jwt=None,next=/success/...)
|
||||
auth.authenticate->>frontend.login: failed + here are urls for logging on 3rd party including success
|
||||
frontend.login->>user: login form with login urls to 3rd party including next url in state
|
||||
user->>3rdparty: login
|
||||
3rdparty->>auth.oauth_response: success
|
||||
auth.oauth_response->>frontend.success: redirect to next url
|
||||
frontend.success->>auth.authenticate: with valid jwt
|
||||
auth.authenticate->>frontend.success: valid + here is profile
|
||||
frontend.success->>frontend.success: decode jwt, check it, then see localstorage
|
||||
frontend.success->>frontend.dashboard: redirect to dashboard
|
||||
```
|
||||
|
||||
## CKAN 2 to CKAN 3 (aka Next Gen)
|
||||
|
||||
How does this conceptual framework map to an evolution of CKAN 2 to CKAN 3?
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
subgraph "CKAN Classic"
|
||||
Signup["Classic signup, e.g., self-service or by sysadmin"]
|
||||
Login["Classic login if you're using the classic UI"]
|
||||
OAuth["OAuth2(ORY/Hydra)"]
|
||||
end
|
||||
|
||||
subgraph "Authentication service (ORY/Kratos)"
|
||||
SSO["Social Sign-On: Github, Google, Facebook"]
|
||||
CC["CKAN Classic"]
|
||||
Admins["Sysadmin users"]
|
||||
Curators["Data curators"]
|
||||
Users["Regular users"]
|
||||
end
|
||||
|
||||
subgraph "Frontend v3"
|
||||
SignupFront["Signup via Kratos"]
|
||||
LoginFront["Login via Kratos"]
|
||||
end
|
||||
|
||||
SignupFront --"Regular user"--> SSO
|
||||
LoginFront --"Regular user"--> SSO
|
||||
|
||||
LoginFront --"Data curator"--> CC
|
||||
|
||||
CC --> Admins
|
||||
CC --> Curators
|
||||
SSO --> Users
|
||||
|
||||
CC --"Redirect"--> OAuth
|
||||
OAuth --> Login
|
||||
```
|
||||
|
||||
Sequence diagram of login process:
|
||||
|
||||
[](https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG5cdEJyb3dzZXItPj5Gcm9udGVuZDogUmVxdWVzdCB0byBgL2F1dGgvbG9naW5gXG4gIEZyb250ZW5kLT4-S3JhdG9zOiBBdXRoIHJlcXVlc3RcbiAgS3JhdG9zLT4-QnJvd3NlcjogUmVkaXJlY3QgdG8gYC9hdXRoL2xvZ2luP3JlcXVlc3Q9e2lkfWAgcGFyYW1cbiAgQnJvd3Nlci0-PkZyb250ZW5kOiBHZXQgYC9hdXRoL2xvZ2luP3JlcXVlc3Q9e2lkfWBcbiAgRnJvbnRlbmQtPj5LcmF0b3M6IEZldGNoIGRhdGEgZm9yIHJlbmRlcmluZyB0aGUgZm9ybVxuICBLcmF0b3MtPj5Gcm9udGVuZDogTG9naW4gb3B0aW9uc1xuICBGcm9udGVuZC0-PkJyb3dzZXI6IFJlbmRlciB0aGUgbG9naW4gZm9ybSB3aXRoIGF2YWlsYWJsZSBvcHRpb25zXG4gIEJyb3dzZXItPj5Gcm9udGVuZDogU3VwcGx5IGZvcm0gZGF0YVxuICBGcm9udGVuZC0-PktyYXRvczogVmFsaWRhdGUgYW5kIGxvZ2luXG4gIEtyYXRvcy0-PkZyb250ZW5kOiBTZXQgc2Vzc2lvblxuICBGcm9udGVuZC0-PkJyb3dzZXI6IFJlZGlyZWN0IHRvIC9kYXNoYm9hcmRcblxuXG5cdFx0XHRcdFx0IiwibWVybWFpZCI6eyJ0aGVtZSI6ImRlZmF1bHQifSwidXBkYXRlRWRpdG9yIjpmYWxzZX0)
|
||||
|
||||
From ORY/Kratos:
|
||||
|
||||
[](https://mermaid-js.github.io/mermaid-live-editor/#/edit/eyJjb2RlIjoic2VxdWVuY2VEaWFncmFtXG4gIHBhcnRpY2lwYW50IEIgYXMgQnJvd3NlclxuICBwYXJ0aWNpcGFudCBLIGFzIE9SWSBLcmF0b3NcbiAgcGFydGljaXBhbnQgQSBhcyBZb3VyIEFwcGxpY2F0aW9uXG5cblxuICBCLT4-SzogSW5pdGlhdGUgTG9naW5cbiAgSy0-PkI6IFJlZGlyZWN0cyB0byB5b3VyIEFwcGxpY2F0aW9uJ3MgL2xvZ2luIGVuZHBvaW50XG4gIEItPj5BOiBDYWxscyAvbG9naW5cbiAgQS0tPj5LOiBGZXRjaGVzIGRhdGEgdG8gcmVuZGVyIGZvcm1zIGV0Y1xuICBCLS0-PkE6IEZpbGxzIG91dCBmb3JtcywgY2xpY2tzIGUuZy4gXCJTdWJtaXQgTG9naW5cIlxuICBCLT4-SzogUE9TVHMgZGF0YSB0b1xuICBLLS0-Pks6IFByb2Nlc3NlcyBMb2dpbiBJbmZvXG5cbiAgYWx0IExvZ2luIGRhdGEgdmFsaWRcbiAgICBLLS0-PkI6IFNldHMgc2Vzc2lvbiBjb29raWVcbiAgICBLLT4-QjogUmVkaXJlY3RzIHRvIGUuZy4gRGFzaGJvYXJkXG4gIGVsc2UgTG9naW4gZGF0YSBpbnZhbGlkXG4gICAgSy0tPj5COiBSZWRpcmVjdHMgdG8geW91ciBBcHBsaWNhaXRvbidzIC9sb2dpbiBlbmRwb2ludFxuICAgIEItPj5BOiBDYWxscyAvbG9naW5cbiAgICBBLS0-Pks6IEZldGNoZXMgZGF0YSB0byByZW5kZXIgZm9ybSBmaWVsZHMgYW5kIGVycm9yc1xuICAgIEItLT4-QTogRmlsbHMgb3V0IGZvcm1zIGFnYWluLCBjb3JyZWN0cyBlcnJvcnNcbiAgICBCLT4-SzogUE9TVHMgZGF0YSBhZ2FpbiAtIGFuZCBzbyBvbi4uLlxuICBlbmRcbiIsIm1lcm1haWQiOnsidGhlbWUiOiJuZXV0cmFsIiwic2VxdWVuY2VEaWFncmFtIjp7ImRpYWdyYW1NYXJnaW5YIjoxNSwiZGlhZ3JhbU1hcmdpblkiOjE1LCJib3hUZXh0TWFyZ2luIjowLCJub3RlTWFyZ2luIjoxNSwibWVzc2FnZU1hcmdpbiI6NDUsIm1pcnJvckFjdG9ycyI6dHJ1ZX19fQ)
|
||||
|
||||
|
||||
Kratos to Hydra in CKAN Classic:
|
||||
|
||||
WIP
|
||||
|
||||
Questions
|
||||
|
||||
* Does CKAN Classic allow us to store arbitrary account information (are there "extras")
|
||||
* How would we avoid having to support identity persistence, delegation etc in both NG frontend and Classic Admin UI?
|
||||
* Can we share cookies (e.g. via using subdomains)
|
||||
* How is login, identity determination etc done at least for frontend in DataHub.io
|
||||
* Should account UI really be in NG frontend vs Classic Admin UI?
|
||||
* how can we handle "invite a user" to my org set up ... (it's basically post processing after sign up ...)
|
||||
|
||||
## Appendix
|
||||
|
||||
### More job stories
|
||||
|
||||
When a user visits the data portal, I want to provide multiple options for him/her to sign up so that I have more users registered and using the data portal.
|
||||
|
||||
When a user needs to change his/her profile info, I want to make sure it is possible, so that I have the up-to-date information about users.
|
||||
|
||||
When my personal info (email etc.) is changed, I want to edit it in my profile so that I provide up-to-date information about me and I receive messages (eg, notifications) properly.
|
||||
|
||||
When I decide to stop using the data portal, I want to be able to delete my account, so that my personal details aren't stored in the service that I don't need anymore.
|
||||
215
site/content/docs/dms/blob-storage.md
Normal file
215
site/content/docs/dms/blob-storage.md
Normal file
@@ -0,0 +1,215 @@
|
||||
# Blob Storage
|
||||
|
||||
## Introduction
|
||||
|
||||
DMS and data portals often need to *store* data as well as metadata. As such, they require a system for doing this. This page focuses on Blob Storage aka Bulk or Raw storage (see [storage](/docs/dms/storage) page for an overview of all types of storage).
|
||||
|
||||
Blob storage is for storing "blobs" of data, that is a raw stream of bytes like files on a filesystem. For blob storage think local filesystem or cloud storage like S3, GCS, etc.
|
||||
|
||||
Blob Storage in a DMS can be provided via:
|
||||
|
||||
* Local file system: storing on disk or storage directly connected to the instance
|
||||
* Cloud storage like S3, Google Cloud Storage, Azure storage etc
|
||||
|
||||
Today, cloud storage would be the default in most cases.
|
||||
|
||||
### Features
|
||||
|
||||
* Storage: Persistent, cost-efficient storage
|
||||
* Download: Fast, reliable download (possibly even with support for edge distribution)
|
||||
* Upload: reliable and rapid upload
|
||||
* Direct upload to (cloud) storage by clients i.e. without going via the DMS. Why? Because cloud storage has many features that it would be costly replicate (e.g. multipart, resumable etc), excellent performance and reliability for upload. It also cuts out the middleman of the DMS backend thereby saving bandwidth, reducing load on the DMS backend and improving performance
|
||||
* Upload UI: having an excellent UI for doing upload. NB: this UI is considered part of the [publish feature](/docs/dms/publish)
|
||||
* Cloud: integrate with cloud storage
|
||||
* Permissions: restricting access to data stored in blob storage based on the permissions of the DMS. For example, if Joe does not have access to a dataset on the DMS he should not be able to access associated blob data in the storage system
|
||||
|
||||
## Flows
|
||||
|
||||
### Direct to Cloud Upload
|
||||
|
||||
Want: Direct upload to cloud storage ... But you need to authorize that ... So give them a token from your app
|
||||
|
||||
A sequence diagram illustrating the process for a direct to cloud upload:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
|
||||
participant Browser as Client (Browser / Code)
|
||||
participant Authz as Authz Server
|
||||
participant BitStore as Storage Access Token Service
|
||||
participant Storage as Cloud Storage
|
||||
|
||||
Browser->>Authz: Give me a BitStore access token
|
||||
Authz->>Browser: Token
|
||||
Browser->>BitStore: Get a signed upload URL (access token, file metdata)
|
||||
BitStore->>Browser: Signed URL
|
||||
Browser->>Storage: Upload file (signed URL)
|
||||
Storage->>Browser: OK (storage metadata)
|
||||
```
|
||||
|
||||
Here's a more elaborate version showing storage of metadata into the MetaStore afterwards (and skipping the Authz service):
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
|
||||
participant browser as Client (Browser / Code)
|
||||
participant vfts as MetaStore
|
||||
participant bitstore as Storage Access Token Service
|
||||
participant storage as Cloud Storage
|
||||
|
||||
browser->>browser: Select files to upload
|
||||
browser->>browser: calculate file hashes (if doing content addressable)
|
||||
browser->bitstore: get signed URLs(file1.csv URL, file2.csv URL, auth info)
|
||||
bitstore->>browser: signed URLs
|
||||
browser->>storage: upload file1.csv
|
||||
storage->>browser: OK
|
||||
browser->>storage: upload file2.csv
|
||||
storage->>browser: OK
|
||||
browser->>browser: Compose datapackage.json
|
||||
browser->>vfts: create dataset(datapackage.json, file1.csv pointer, file2.csv pointer, jwt token, ...)
|
||||
vfts->>browser: OK
|
||||
```
|
||||
|
||||
## CKAN 2 (Classic)
|
||||
|
||||
Blob Storage is known as the FileStore in CKAN v2 and below. The default is local disk storage.
|
||||
|
||||
There is support for cloud storage via a variety of extensions the most prominent of which is `ckanext-cloudstorage`: https://github.com/TkTech/ckanext-cloudstorage
|
||||
|
||||
There are a variety of issues:
|
||||
|
||||
* Cloud storage is not a first class citizen in CKAN: CKAN defaults to local file storage but cloud storage is the default in the world and has much better scalability, performance as well as integratability with cloud deployment
|
||||
* The FileStore interface definition has a poor separation of concerns (for example, blob storage file paths is set in the FileStore component not in core CKAN) which makes it hard / hacky to extend and use for key use cases e.g. versioning.
|
||||
* `ckanext-cloudstorage` (the default cloud storage extension) is ok but has many issues e.g.
|
||||
* No direct to cloud upload: it uses CKAN backend as a middleman so all data must go via ckan backend
|
||||
* Implements its own (sometimes unreliable) version of multipart upload (which means additional code which isn't as reliable as cloud storage providers interface)
|
||||
* No access to advanced features such as resumability etc
|
||||
|
||||
Generally, we at Datopian have seen a lot of issues around multipart / large file upload stability with clients and are still seeing issues when a lot of large files are uploaded via scripts. Fixing and refactoring code related to storage is very costly, and tends to result in client specific "hacks".
|
||||
|
||||
## CKAN v3
|
||||
|
||||
An approach to blob storage that leverages cloud blob storage directly (i.e. without having to upload and serve all files via the CKAN web server), unlocking the performance characteristics of the storage backend directly. It is designed with a microservice approach and supports direct to cloud uploads and downloads. The key components are listed in the next section. You can read more about the overall design approach in the [design section below](#Design).
|
||||
|
||||
It is backwards compatible with CKAN v2 and has been successfully deployed with CKAN v2.8 and v2.9.
|
||||
|
||||
**Status: Production.**
|
||||
|
||||
### Components
|
||||
|
||||
* [ckanext-blob-storage](https://github.com/datopian/ckanext-blob-storage) (formerly known as ckanext-external-storage)
|
||||
* Hooking CKAN to Giftless replacing resource storage
|
||||
* Depends on giftless-client and ckanext-authz-service
|
||||
* Doesn't implement IUploader - completely overrides upload / download routes for resources
|
||||
* [Giftless](https://github.com/datopian/giftless) - Git LFS compatible implementation for storage with some extras on top. This hands out access tokens to store data in cloud storage.
|
||||
* Docs at https://giftless.datopian.com
|
||||
* Backends for Azure, Google Cloud Storage and local
|
||||
* Multipart support (on top of standard LFS protocol)
|
||||
* Accepts JWT tokens for authentication and authorization
|
||||
* [ckanext-authz-service](https://github.com/datopian/ckanext-authz-service/) - This extension uses CKAN’s built-in authentication and authorization capabilities to: a) Generate JWT tokens and provide them via CKAN’s Web API to clients and b) Validate JWT tokens.
|
||||
* Allows hooking CKAN's authentication and authorization capabilities to generate signed JWT tokens, to integrate with external systems
|
||||
* Not specific for Giftless, but this is what it was built for
|
||||
* [ckanext-asset-storage](https://github.com/datopian/ckanext-asset-storage) - this takes care of storing non-data assets e.g. organization images etc.
|
||||
* CKAN IUploader for assets (not resources!)
|
||||
* Pluggable backends - currently local and Azure
|
||||
* Much cleaner than older implementations (ckanext-cloudstorage etc.)
|
||||
|
||||
Clients:
|
||||
|
||||
* [giftless-client-py](https://github.com/datopian/giftless-client) - Python client for Git LFS and Giftless-specific features
|
||||
* Used by ckanext-blob-storage and other tools
|
||||
* [giftless-client-js](https://github.com/datopian/giftless-client-js) - Javascript client for Git LFS and Giftless-specific features
|
||||
* Used by ckanext-blob-storage and other tools for creating uploaders in the UI
|
||||
|
||||
## Design
|
||||
|
||||
### Purpose
|
||||
|
||||
The goal of this project is to create a more **_flexible_** system for storing **_data files_** (AKA “resources”) for **_CKAN_ and _other implementations_** of a data portal so that CKAN can support versioning, large file upload (and great file upload UX), plug easily into cloud and local file storage backends and, in general, is easy to customize both for storage layer and for CKAN client code of that layer
|
||||
|
||||
### Features
|
||||
|
||||
* Do one thing and do it well: provide an API to store and retrieve files from storage, in a way that is pluggable into a micro-services based application and to existing CKAN (2.8 / 2.9)
|
||||
* Does not force, and in fact is not aware of, a specific file naming logic (i.e. resource file names could be based on a user given name, a content hash, a revision ID or any mixture of these - it is up to the using system to decide)
|
||||
* Does not force a specific storage backend; Should support Amazon S3, Azure Storage and local file storage in some way initially but in general backend should be pluggable
|
||||
* Does not force a specific authentication scheme; Expects a signed JWT token, does not care who signed it and how the user got authenticated
|
||||
* Does not force complex authorization scheme; Leave it to external system to do complex authorization if needed;
|
||||
* By default, the system can work in an “admin party” mode where all authenticated users have full access to all files. This will be “good enough” for many DMS implementations including CKAN.
|
||||
* Potentially, allow plugging in a more complex authorization logic that relies on JWT claims to perform granular authorization checks
|
||||
|
||||
### For Data Files (i.e. Blobs)
|
||||
|
||||
This system is about storing and providing access to blobs, or streams of bytes; It is not about providing access to the data stored within (i.e. it is not meant to replace CKAN’s datastore).
|
||||
|
||||
### For CKAN – whilst not necessarily CKAN Specific
|
||||
|
||||
While the system’s design should not be CKAN specific in any way, our current client needs require us to provide a CKAN extension that integrates with this system.
|
||||
|
||||
CKAN’s current IUploader interface has been identified to be too narrow to provide the functionality required by complex projects (resource versioning, direct cloud uploads and downloads, large file support and multipart support). While some of these needs could be and have been “hacked” through the IUploader interface, the implementations have been over complex and hard to debug.
|
||||
|
||||
Our goal should be to provide a CKAN extension that provides the following functionality directly:
|
||||
|
||||
* Uploading and downloading resource files directly from the client if supported by the storage backend
|
||||
* Multipart upload support if supported by storage backend
|
||||
* Handling of signed URLs for uploads and private downloads
|
||||
* Client side code for handling multipart uploads
|
||||
* TBD: If storage backend does not support direct uploads / downloads, fall back to …
|
||||
|
||||
In addition, this extension should provide an API for other extensions to do things like:
|
||||
|
||||
* Set the file naming scheme (We need this for ckanext-versions)
|
||||
* Lower level file access, e.g. move and delete files. We may need this in the future to optimize storage and deduplicate files as proposed for ckanext-versions
|
||||
|
||||
In addition, this extension must “play nice” with common CKAN features such as the datastore extension and related datapusher / xloader extensions.
|
||||
|
||||
### Usable For other DMS implementations
|
||||
|
||||
There should be nothing in this system, except for the CKAN extension described above, that is specific to CKAN. That will allow to re-use and re-integrate this system as a micro-service in other DMS implementations such as ckan-ng and others.
|
||||
In fact, the core part of this system should be a generic, abstract storage service with a light authorization layer. This could make it useful in a host of situations where storage micro-service is needed.
|
||||
|
||||
### High Level Principles
|
||||
|
||||
Common Principles
|
||||
|
||||
* Uploads and downloads directly from cloud provides to browser
|
||||
* Signed uploads / downloads - for private / authorized only data access
|
||||
* Support for AWS, Azure and potentially GCP storage
|
||||
* Support for local (non cloud) storage, potentially through a system like [https://min.io/](https://min.io/)
|
||||
* Multipart / large file upload support (a few GB in size should be supported for Gates)
|
||||
* Not opinionated about file naming / paths; Allow users to set file locations under some pre-defined patchs / buckets
|
||||
* Client side support - browser widgets / code for uploading and downloading files / multipart uploads directly to different backends
|
||||
* Well-documented flow for using from API (not browser)
|
||||
* Provided API for deleting and moving files
|
||||
* Provided API for accessing storage-level metadata (e.g. file MD5) (do we need this could be useful for processes that do things like deduplicate storage)
|
||||
* Provided API for managing storage-level object level settings (e.g. “Content-disposition” / “Content-type” headers, etc.)
|
||||
* Authorization based on some kind of portable scheme (JWT)
|
||||
|
||||
CKAN integration specific (implemented as a CKAN extension)
|
||||
|
||||
* JWT generation based on current CKAN user permissions
|
||||
* Client widgets integration (or CKAN specific widgets) in right places in CKAN templates
|
||||
* Hook into resource upload / download / deletion controllers in CKAN
|
||||
* API to allow other extensions to control storage level object metadata (headers, path)
|
||||
* API to allow other extensions to hook into lifecycle events - upload completion, download request, deletion etc.
|
||||
|
||||
|
||||
### Components
|
||||
|
||||
The Decoupled Storage solution should be split into several parts, with some parts being independent of others:
|
||||
|
||||
* [External] Cloud Storage service (or API similar if local file system) e.g. S3, GCS, Azure Storage, Min.io (for local file system)
|
||||
* Cloud Storage Access Service
|
||||
* [External] Permissions Service for granting general permission tokens that give access to Cloud Storage Access Service
|
||||
* JWT tokens can be generated by any party that has the right signing key. Thus, we can initially do without this if JWT signing is implemented as part of the CKAN extension
|
||||
* Browser based Client for Cloud Storage (compatible with #1 and with different cloud vendors)
|
||||
* CKAN extension that wraps the two parts above to provide a storage solution for CKAN
|
||||
|
||||
### Questions
|
||||
|
||||
* What is file structure in cloud ... i.e. What is the file path for uploaded files? Options:
|
||||
* Client chooses a name/path
|
||||
* Content addressable i.e. the name is given by the content? How? Use a hash.]
|
||||
* Beauty of that: standard way to name things. The same thing has the same name (modulo collisions)
|
||||
* Goes with versioning => same file = same name, diff file = diff name
|
||||
* And do you enforce that from your app
|
||||
* Request for token needs to include the destination file path
|
||||
503
site/content/docs/dms/ckan-client-guide/index.md
Normal file
503
site/content/docs/dms/ckan-client-guide/index.md
Normal file
@@ -0,0 +1,503 @@
|
||||
# CKAN Client Guide
|
||||
|
||||
Guide to interacting with [CKAN](/docs/dms/ckan) for power users such as data scientists, data engineers and data wranglers.
|
||||
|
||||
This guide is about adding and managing data in CKAN programmatically and it assumes:
|
||||
|
||||
* You are familiar with key concepts like metadata, data, etc.
|
||||
* You are working programmatically with a programming language such as Python, JavaScript or R (_coming soon_).
|
||||
|
||||
## Frictionless Formats
|
||||
|
||||
Clients use [Frictionless formats](https://specs.frictionlessdata.io/) by default for describing dataset and resource objects passed to client methods. Internally, we then use the a *CKAN {'<=>'} Frictionless Mapper* (both [in JavaScript]( https://github.com/datopian/frictionless-ckan-mapper-js ) and [in Python](https://github.com/frictionlessdata/frictionless-ckan-mapper)) to convert objects to CKAN formats before calling the API. **Thus, you can use _Frictionless Formats_ by default with the client**.
|
||||
|
||||
>[!tip]As CKAN moves to Frictionless to default this will gradually become unnecessary.
|
||||
|
||||
## Quick start
|
||||
|
||||
Most of this guide has Python programming language in mind, including its [convention regading using _snake case_ for instances and methods names](https://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles).
|
||||
|
||||
If needed, you can adapt the instructions to JavaScript and R (coming soon) by using _camel case_ instead — for example, if in the Python code we have `client.push_blob(…)`, in JavaScript it would be `client.pushBlob(…)`.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Install the client for your language of choice:
|
||||
|
||||
* Python: https://github.com/datopian/ckan-client-py#install
|
||||
* JavaScript: https://github.com/datopian/ckan-client-js#install
|
||||
* R: _coming soon_
|
||||
|
||||
### Create a client
|
||||
|
||||
#### Python
|
||||
|
||||
```python
|
||||
from ckanclient import Client
|
||||
|
||||
|
||||
api_key = '771a05ad-af90-4a70-beea-cbb050059e14'
|
||||
api_url = 'http://localhost:5000'
|
||||
organization = 'datopian'
|
||||
dataset = 'dailyprices'
|
||||
lfs_url = 'http://localhost:9419'
|
||||
|
||||
client = Client(api_url, organization, dataset, lfs_url)
|
||||
```
|
||||
|
||||
#### JavaScript
|
||||
|
||||
```javascript
|
||||
const { Client } = require('ckanClient')
|
||||
|
||||
apiKey = '771a05ad-af90-4a70-beea-cbb050059e14'
|
||||
apiUrl = 'http://localhost:5000'
|
||||
organization = 'datopian'
|
||||
dataset = 'dailyprices'
|
||||
|
||||
const client = Client(apiKey, organization, dataset, apiUrl)
|
||||
```
|
||||
|
||||
### Upload a resource
|
||||
|
||||
That is to say, upload a file, implicitly creating a new dataset.
|
||||
|
||||
#### Python
|
||||
|
||||
```python
|
||||
from frictionless import describe
|
||||
|
||||
|
||||
resource = describe('my-data.csv')
|
||||
client.push_blob(resource)
|
||||
```
|
||||
|
||||
### Create a new empty Dataset with metadata
|
||||
|
||||
#### Python
|
||||
|
||||
```python
|
||||
client.create('my-data')
|
||||
client.push(resource)
|
||||
```
|
||||
|
||||
### Adding a resource to an existing Dataset
|
||||
|
||||
>[!note]Not implemented yet.
|
||||
|
||||
|
||||
```python
|
||||
client.create('my-data')
|
||||
client.push_resource(resource)
|
||||
```
|
||||
|
||||
### Edit a Dataset's metadata
|
||||
|
||||
>[!note]Not implemented yet.
|
||||
|
||||
|
||||
```python
|
||||
dataset = client.retrieve('sample-dataset')
|
||||
client.update_metadata(
|
||||
dataset,
|
||||
metadata: {'maintainer_email': 'sample@datopian.com'}
|
||||
)
|
||||
```
|
||||
|
||||
For details of metadata see the [metadata reference below](#metadata-reference).
|
||||
|
||||
## API - Porcelain
|
||||
|
||||
### `Client.create`
|
||||
|
||||
Expects as a single argument: a _string_, or a _dict_ (in Python), or an _object_ (in JavaScript). This argument is either a valid dataset name or dictionary with metadata for the dataset in Frictionless format.
|
||||
|
||||
### `Client.push`
|
||||
|
||||
Expects a single argument: a _dict_ (in Python) or an _object_ (in JavaScript) with a dataset metadata in Frictionless format.
|
||||
|
||||
### `Client.retrieve`
|
||||
|
||||
Expects a single argument: a string with a dataset name or uniquer ID. Returns a Frictionless resource as a _dict_ (in Python) or as an _Promisse .<object>_ (in JavaScript).
|
||||
|
||||
### `Client.push_blob`
|
||||
|
||||
Expects a single argument: a _dict_ (in Python) or an _object_ (in JavaScript) with a Frictionless resource.
|
||||
|
||||
## API - Plumbing
|
||||
|
||||
### `Client.action`
|
||||
|
||||
This method bridges access to the CKAN API _action endpoint_.
|
||||
|
||||
#### In Python
|
||||
|
||||
Arguments:
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
| -------------------- | ---------- | ---------- | ------------------------------------------------------------ |
|
||||
| `name` | `str` | (required) | The action name, for example, `site_read`, `package_show`… |
|
||||
| `payload` | `dict` | (required) | The payload being sent to CKAN. When a payload is provided to a GET request, it will be converted to URL parameters and each key will be converted to snake case. |
|
||||
| `http_get` | `bool` | `False` | Optional, if `True` will make `GET` request, otherwise `POST`. |
|
||||
| `transform_payload` | `function` | `None` | Function to mutate the `payload` before making the request (useful to convert to and from CKAN and Frictionless formats). |
|
||||
| `transform_response` | `function` | `None` | function to mutate the response data before returning it (useful to convert to and from CKAN and Frictionless formats). |
|
||||
|
||||
>[!note]The CKAN API uses the CKAN dataset and resource formats (rather than Frictionless formats).
|
||||
In other words, to stick to Frictionless formats, you can pass `frictionless_ckan_mapper.frictionless_to_ckan` as `transform_payload`, and `frictionless_ckan_mapper.ckan_to_frictionless` as `transform_response`.
|
||||
|
||||
|
||||
#### In JavaScript
|
||||
|
||||
Arguments:
|
||||
|
||||
| Name | Type | Default | Description |
|
||||
| ------------ | ------------------- | ------------------ | ------------------------------------------------------------ |
|
||||
| `actionName` | <code>string</code> | (required) | The action name, for example, `site_read`, `package_show`… |
|
||||
| `payload` | <code>object</code> | (required) | The payload being sent to CKAN. When a payload is provided to a GET request, it will be converted to URL parameters and each key will be converted to snake case. |
|
||||
| `useHttpGet` | <code>object</code> | <code>false</code> | Optional, if `True` will make `GET` request, otherwise `POST`. |
|
||||
|
||||
>[!note]The JavaScript implementation uses the CKAN dataset and resource formats (rather than Frictionless formats).
|
||||
In other words, to stick to Frictionless formats, you need to convert from Frictionless to CKAN before calling `action` , and from CKAN to Frictionless after calling `action`.
|
||||
|
||||
## Metadata reference
|
||||
|
||||
>[!info]Your site may have custom metadata that differs from the example set below.
|
||||
|
||||
|
||||
### Profile
|
||||
|
||||
**(`string`)** Defaults to _data-resource_.
|
||||
|
||||
The profile of this descriptor.
|
||||
|
||||
Every Package and Resource descriptor has a profile. The default profile, if none is declared, is `data-package` for Package and `data-resource` for Resource.
|
||||
|
||||
#### Examples
|
||||
|
||||
- `{"profile":"tabular-data-package"}`
|
||||
|
||||
- `{"profile":"http://example.com/my-profiles-json-schema.json"}`
|
||||
|
||||
### Name
|
||||
|
||||
**(`string`)**
|
||||
|
||||
An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
|
||||
|
||||
This is ideally a url-usable and human-readable name. Name `SHOULD` be invariant, meaning it `SHOULD NOT` change when its parent descriptor is updated.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"name":"my-nice-name"}`
|
||||
|
||||
### Path
|
||||
|
||||
A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
|
||||
|
||||
The dereferenced value of each referenced data source in `path` `MUST` be commensurate with a native, dereferenced representation of the data the resource describes. For example, in a *Tabular* Data Resource, this means that the dereferenced value of `path` `MUST` be an array.
|
||||
|
||||
#### Validation
|
||||
|
||||
##### It must satisfy one of these conditions
|
||||
|
||||
###### Path
|
||||
|
||||
**(`string`)**
|
||||
|
||||
A fully qualified URL, or a POSIX file path..
|
||||
|
||||
Implementations need to negotiate the type of path provided, and dereference the data accordingly.
|
||||
|
||||
**Examples**
|
||||
|
||||
- `{"path":"file.csv"}`
|
||||
|
||||
- `{"path":"http://example.com/file.csv"}`
|
||||
|
||||
**(`array`)**
|
||||
|
||||
**Examples**
|
||||
|
||||
- `["file.csv"]`
|
||||
|
||||
- `["http://example.com/file.csv"]`
|
||||
|
||||
#### Examples
|
||||
|
||||
- `{"path":["file.csv","file2.csv"]}`
|
||||
|
||||
- `{"path":["http://example.com/file.csv","http://example.com/file2.csv"]}`
|
||||
|
||||
- `{"path":"http://example.com/file.csv"}`
|
||||
|
||||
### Data
|
||||
|
||||
Inline data for this resource.
|
||||
|
||||
### Schema
|
||||
|
||||
**(`object`)**
|
||||
|
||||
A schema for this resource.
|
||||
|
||||
### Title
|
||||
|
||||
**(`string`)**
|
||||
|
||||
A human-readable title.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"title":"My Package Title"}`
|
||||
|
||||
### Description
|
||||
|
||||
**(`string`)**
|
||||
|
||||
A text description. Markdown is encouraged.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"description":"# My Package description\nAll about my package."}`
|
||||
|
||||
### Home Page
|
||||
|
||||
**(`string`)**
|
||||
|
||||
The home on the web that is related to this data package.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"homepage":"http://example.com/"}`
|
||||
|
||||
### Sources
|
||||
|
||||
**(`array`)**
|
||||
|
||||
The raw sources for this resource.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"sources":[{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}]}`
|
||||
|
||||
### Licenses
|
||||
|
||||
**(`array`)**
|
||||
|
||||
The license(s) under which the resource is published.
|
||||
|
||||
This property is not legally binding and does not guarantee that the package is licensed under the terms defined herein.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"licenses":[{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}]}`
|
||||
|
||||
### Format
|
||||
|
||||
**(`string`)**
|
||||
|
||||
The file format of this resource.
|
||||
|
||||
`csv`, `xls`, `json` are examples of common formats.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"format":"xls"}`
|
||||
|
||||
### Media Type
|
||||
|
||||
**(`string`)**
|
||||
|
||||
The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"mediatype":"text/csv"}`
|
||||
|
||||
### Encoding
|
||||
|
||||
**(`string`)** Defaults to _utf-8_.
|
||||
|
||||
The file encoding of this resource.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"encoding":"utf-8"}`
|
||||
|
||||
### Bytes
|
||||
|
||||
**(`integer`)**
|
||||
|
||||
The size of this resource in bytes.
|
||||
|
||||
#### Example
|
||||
|
||||
- `{"bytes":2082}`
|
||||
|
||||
### Hash
|
||||
|
||||
**(`string`)**
|
||||
|
||||
The MD5 hash of this resource. Indicate other hashing algorithms with the {'{algorithm}'}:{'{hash}'} format.
|
||||
|
||||
#### Examples
|
||||
|
||||
- `{"hash":"d25c9c77f588f5dc32059d2da1136c02"}`
|
||||
|
||||
- `{"hash":"SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"}`
|
||||
|
||||
## Generating templates
|
||||
|
||||
You can use [`jsv`](https://github.com/datopian/jsv) to generate a template script in Python, JavaScript, and R.
|
||||
|
||||
To install it:
|
||||
|
||||
```
|
||||
$ npm install -g git+https://github.com/datopian/jsv.git
|
||||
```
|
||||
|
||||
### Python
|
||||
|
||||
```
|
||||
$ jsv data-resource.json --output py
|
||||
```
|
||||
|
||||
**Output**
|
||||
```python
|
||||
dataset_metadata = {
|
||||
"profile": "data-resource", # The profile of this descriptor.
|
||||
# [example] "profile": "tabular-data-package"
|
||||
# [example] "profile": "http://example.com/my-profiles-json-schema.json"
|
||||
"name": "my-nice-name", # An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
|
||||
"path": ["file.csv","file2.csv"], # A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
|
||||
# [example] "path": ["http://example.com/file.csv","http://example.com/file2.csv"]
|
||||
# [example] "path": "http://example.com/file.csv"
|
||||
"data": None, # Inline data for this resource.
|
||||
"schema": None, # A schema for this resource.
|
||||
"title": "My Package Title", # A human-readable title.
|
||||
"description": "# My Package description\nAll about my package.", # A text description. Markdown is encouraged.
|
||||
"homepage": "http://example.com/", # The home on the web that is related to this data package.
|
||||
"sources": [{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}], # The raw sources for this resource.
|
||||
"licenses": [{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}], # The license(s) under which the resource is published.
|
||||
"format": "xls", # The file format of this resource.
|
||||
"mediatype": "text/csv", # The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
|
||||
"encoding": "utf-8", # The file encoding of this resource.
|
||||
# [example] "encoding": "utf-8"
|
||||
"bytes": 2082, # The size of this resource in bytes.
|
||||
"hash": "d25c9c77f588f5dc32059d2da1136c02", # The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
|
||||
# [example] "hash": "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
### JavaScript
|
||||
|
||||
```
|
||||
$ jsv data-resource.json --output js
|
||||
```
|
||||
|
||||
**Output**
|
||||
```javascript
|
||||
const datasetMetadata = {
|
||||
// The profile of this descriptor.
|
||||
profile: "data-resource",
|
||||
// [example] profile: "tabular-data-package"
|
||||
// [example] profile: "http://example.com/my-profiles-json-schema.json"
|
||||
// An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
|
||||
name: "my-nice-name",
|
||||
// A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
|
||||
path: ["file.csv", "file2.csv"],
|
||||
// [example] path: ["http://example.com/file.csv","http://example.com/file2.csv"]
|
||||
// [example] path: "http://example.com/file.csv"
|
||||
// Inline data for this resource.
|
||||
data: null,
|
||||
// A schema for this resource.
|
||||
schema: null,
|
||||
// A human-readable title.
|
||||
title: "My Package Title",
|
||||
// A text description. Markdown is encouraged.
|
||||
description: "# My Package description\nAll about my package.",
|
||||
// The home on the web that is related to this data package.
|
||||
homepage: "http://example.com/",
|
||||
// The raw sources for this resource.
|
||||
sources: [
|
||||
{
|
||||
title: "World Bank and OECD",
|
||||
path: "http://data.worldbank.org/indicator/NY.GDP.MKTP.CD",
|
||||
},
|
||||
],
|
||||
// The license(s) under which the resource is published.
|
||||
licenses: [
|
||||
{
|
||||
name: "odc-pddl-1.0",
|
||||
path: "http://opendatacommons.org/licenses/pddl/",
|
||||
title: "Open Data Commons Public Domain Dedication and License v1.0",
|
||||
},
|
||||
],
|
||||
// The file format of this resource.
|
||||
format: "xls",
|
||||
// The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
|
||||
mediatype: "text/csv",
|
||||
// The file encoding of this resource.
|
||||
encoding: "utf-8",
|
||||
// [example] encoding: "utf-8"
|
||||
// The size of this resource in bytes.
|
||||
bytes: 2082,
|
||||
// The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
|
||||
hash: "d25c9c77f588f5dc32059d2da1136c02",
|
||||
// [example] hash: "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"
|
||||
};
|
||||
```
|
||||
|
||||
### R
|
||||
|
||||
```
|
||||
$ jsv data-resource.json --output r
|
||||
```
|
||||
|
||||
**Output**
|
||||
```r
|
||||
# The profile of this descriptor.
|
||||
profile <- "data-resource"
|
||||
# [example] profile <- "tabular-data-package"
|
||||
# [example] profile <- "http://example.com/my-profiles-json-schema.json"
|
||||
# An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
|
||||
name <- "my-nice-name"
|
||||
# A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
|
||||
path <- ["file.csv","file2.csv"]
|
||||
# [example] path <- ["http://example.com/file.csv","http://example.com/file2.csv"]
|
||||
# [example] path <- "http://example.com/file.csv"
|
||||
# Inline data for this resource.
|
||||
data <- NA
|
||||
# A schema for this resource.
|
||||
schema <- NA
|
||||
# A human-readable title.
|
||||
title <- "My Package Title"
|
||||
# A text description. Markdown is encouraged.
|
||||
description <- "# My Package description\nAll about my package."
|
||||
# The home on the web that is related to this data package.
|
||||
homepage <- "http://example.com/"
|
||||
# The raw sources for this resource.
|
||||
sources <- [{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}]
|
||||
# The license(s) under which the resource is published.
|
||||
licenses <- [{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}]
|
||||
# The file format of this resource.
|
||||
format <- "xls"
|
||||
# The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
|
||||
mediatype <- "text/csv"
|
||||
# The file encoding of this resource.
|
||||
encoding <- "utf-8"
|
||||
# [example] encoding <- "utf-8"
|
||||
# The size of this resource in bytes.
|
||||
bytes <- 2082L
|
||||
# The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
|
||||
hash <- "d25c9c77f588f5dc32059d2da1136c02"
|
||||
# [example] hash <- "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"
|
||||
|
||||
```
|
||||
|
||||
|
||||
## Design Principles
|
||||
|
||||
The client **should** use Frictionless formats by default for describing dataset and resource objects passed to client methods.
|
||||
|
||||
In addition, where more than metadata is needed (e.g., we need to access the data stream, or get the schema) we expect the _Dataset_ and _Resource_ objects to follow the [Frictionless Data Lib pattern](https://github.com/frictionlessdata/project/blob/master/rfcs/0004-frictionless-data-lib-pattern.md).
|
||||
108
site/content/docs/dms/ckan-enterprise/index.md
Normal file
108
site/content/docs/dms/ckan-enterprise/index.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# CKAN Enterprise
|
||||
|
||||
## Introduction
|
||||
|
||||
CKAN Enterprise is our name for what we plan would become our standard "base" distribution for CKAN going forward:
|
||||
|
||||
* It is a CKAN standard code base with micro-services.
|
||||
* Enterprise grade data catalog and portal targeted at Gov (open data portals) and Enterprise (Data Catalogs +).
|
||||
* It is also known as [Datopian DMS](https://www.datopian.com/datopian-dms/).
|
||||
|
||||
## Roadmap 2021 and beyond
|
||||
|
||||
| | Current | CKAN Enterprise |
|
||||
|-------------------|--------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
|
||||
| Raw storage | Filestore | Giftless |
|
||||
| Data Loader (db) | DataPusher extension | Aircan |
|
||||
| Data Storage (db) | Postgres | Any database engine. By default, Postgres |
|
||||
| Data API (read) | Built-in DataStore extension's API including SQL endpoint | GraphQL based standalone micro-service |
|
||||
| Frontend (public) | Build-in frontend into CKAN Classic python app (some projects are using nodejs app) | PortalJS or nodejs app |
|
||||
| Data Explorer | ReclineJS (some projects that uses nodejs app for frontend have React based Data Explorer) | GraphQL based Data Explorer |
|
||||
| Auth | Traditional login/password + extendable with CKAN Classic extensions | SSO with default Google, Github, Facebook and Microsoft options |
|
||||
| Permissions | CKAN Classic based permissions | Existing permissions exposed via JWT based authz API |
|
||||
|
||||
## Timeline 2021
|
||||
|
||||
To develop a base distribution of CKAN Enterprise, we want to build a demo project with the features from the roadmap. This way we can:
|
||||
|
||||
* understand its advantages/limitations;
|
||||
* compare against other instances of CKAN;
|
||||
* demonstrate for the potential clients.
|
||||
|
||||
High level overview of the planned features with ETA:
|
||||
|
||||
| Name | Description | Effort | ETA |
|
||||
| ----------------------------- | ------------------------------------ | ------ | --- |
|
||||
| [Init](#Init) | Select CKAN version and deploy to DX | xs | Q2 |
|
||||
| [Blobstore](#Blobstore) | Integrate Giftless for raw storage | s | Q2 |
|
||||
| [Versioning](#Versioning) | Develop/integrate new versioning sys | l | Q3 |
|
||||
| [DataLoader](#DataLoader) | Develop/integrate Aircan | xl | Q3 |
|
||||
| [Data API](#Data-API) | Integrate new Data API (read) | m | Q2 |
|
||||
| [Frontend](#Frontend) | Build a theme using PortalJS | s | Q2 |
|
||||
| [DataExplorer](#DataExplorer) | Integrate into PortalJS | s | Q2 |
|
||||
| [Permissions](#Permissions) | Develop permissions in read frontend | m | Q4 |
|
||||
| [Auth](#Auth) | Integrate | s | Q4 |
|
||||
|
||||
### Init
|
||||
|
||||
Initialize a new project for development of CKAN Enterprise.
|
||||
|
||||
Tasks:
|
||||
|
||||
* Boot project in Datopian-DX cluster
|
||||
* Use CKAN v2.8.x (latest patch) or 2.9.x
|
||||
* Don't setup DataPusher
|
||||
* Namespace: `ckan-enterprise`
|
||||
* Domain: `enterprise.ckan.datopian.com`
|
||||
|
||||
### Blobstore
|
||||
|
||||
See [blob storage](/docs/dms/blob-storage#ckan-v3)
|
||||
|
||||
### Versioning
|
||||
|
||||
See [versioning](/docs/dms/versioning#ckan-v3)
|
||||
|
||||
### DataLoader
|
||||
|
||||
See [DataLoader](/docs/dms/load)
|
||||
|
||||
### Data API
|
||||
|
||||
* Install new [Data API service](https://github.com/datopian/data-api) in the project
|
||||
* Install Hasura service in the project
|
||||
* Set it up to work with DB of CKAN Enterprise
|
||||
* Read more about Data API [here](/docs/dms/data-api#read-api-3)
|
||||
|
||||
Notes:
|
||||
|
||||
* We could experiment and use various features of Hasura, eg:
|
||||
* Setting up row/column limits per user role (permissions)
|
||||
* Subscriptions to auto load new data rows
|
||||
|
||||
### Frontend
|
||||
|
||||
PortalJS for the read frontend of CKAN Enterprise. [Read more](/docs/dms/frontend/#frontend).
|
||||
|
||||
### DataExplorer
|
||||
|
||||
A new Data Explorer based on GraphQL API: https://github.com/datopian/data-explorer-graphql
|
||||
|
||||
### Permissions
|
||||
|
||||
See [permissions](/docs/dms/permissions#permissions-authorization).
|
||||
|
||||
### Auth
|
||||
|
||||
Next generation, Kratos based, authentication (mostly SSO with no Traditional login by default) with following options out of the box:
|
||||
|
||||
* GitHub
|
||||
* Google
|
||||
* Facebook
|
||||
* Microsoft
|
||||
|
||||
Easy to add:
|
||||
|
||||
* Discord
|
||||
* GitLab
|
||||
* Slack
|
||||
365
site/content/docs/dms/ckan-v3/index.md
Normal file
365
site/content/docs/dms/ckan-v3/index.md
Normal file
@@ -0,0 +1,365 @@
|
||||
# CKAN v3
|
||||
|
||||
## Introduction
|
||||
|
||||
This document describes the architectures of CKAN v2 ("CKAN Classic"), CKAN v3 (also known as "CKAN Next Gen" for Next Generation), and CKAN v3 hybrid. The latter is an intermediate approach towards v3, where we still use CKAN v2 and common extensions, and only create microservices for new features.
|
||||
|
||||
You will also find out how to do common tasks such as theming or testing, in each of the architectures.
|
||||
|
||||
*Note: this blog post has an overview of the more decoupled, microservices approach at the core of v3: https://www.datopian.com/2021/05/17/a-more-decoupled-ckan/*
|
||||
|
||||
## CKAN v2, CKAN v3 and Why v3
|
||||
|
||||
In yellow, you see one single Python process:
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph ckanclassic["CKAN Classic"]
|
||||
ckancore["Core"]
|
||||
end
|
||||
```
|
||||
|
||||
When you want to extend core functionality of CKAN v2 (Classic), you write a Python package that must be installed in CKAN. This way, the extension will also run in the same process as the core functionality. This is known as a monolithic architecture.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph ckanclassic["CKAN Classic"]
|
||||
ckancore["Core"] --> ckanext["CKAN Extension 1"]
|
||||
end
|
||||
```
|
||||
|
||||
When you start to add multiple features, through extensions, what you get is one single Python process running many non-related functionalities.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph ckanclassic["CKAN Classic"]
|
||||
ckancore["Core"] --> ckanext["CKAN Extension 1"]
|
||||
ckancore --> ckanext2["CKAN Extension 2"]
|
||||
ckancore --> ckanext3["CKAN Extension 3"]
|
||||
ckancore --> ckanext4["CKAN Extension 4"]
|
||||
ckancore --> ckanext5["CKAN Extension 5"]
|
||||
end
|
||||
```
|
||||
|
||||
This monolithic approach has advantages in terms of simplicity of development and deployment, especially when the system is small. However, as it grows in scale and scope, there are an increasing number of issues.
|
||||
|
||||
In this approach, an optional extension has the ability to crash the whole CKAN instance. Every new feature must be written in the same language and framework (e.g. Python, leveraging Flask or Django). And, perhaps most fundamentally, the overall system is highly coupled, making it complex and hard to understand, debug, extend, and evolve.
|
||||
|
||||
### Microservices and CKAN v3
|
||||
|
||||
The main way to address these problems while gaining extra benefits is to move to a microservices-based architecture.
|
||||
|
||||
Thus, we recommend building the next version of CKAN – CKAN v3 – on a microservices approach.
|
||||
|
||||
[!tip]CKAN v3 is sometimes also referred to as CKAN Next Gen(eration).
|
||||
|
||||
With microservices, each piece of functionality runs in its own service and process.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph ckanapi3["CKAN API 3"]
|
||||
ckanapi31["API 3"]
|
||||
end
|
||||
|
||||
subgraph ckanapi2["CKAN API 2"]
|
||||
ckanapi21["API 2"]
|
||||
end
|
||||
|
||||
subgraph ckanapi1["CKAN API 1"]
|
||||
ckanapi11["API 1"]
|
||||
end
|
||||
|
||||
subgraph ckanfrontend["CKAN frontend"]
|
||||
ckanfrontend1["Frontend"]
|
||||
end
|
||||
|
||||
ckanfrontend1 --> ckanapi11
|
||||
ckanfrontend1 --> ckanapi21
|
||||
ckanfrontend1 --> ckanapi31
|
||||
```
|
||||
|
||||
### Incremental Evolution – Hybrid v3
|
||||
|
||||
One of the other advantages of the microservices approach is that it can also be used to extend and evolve current CKAN v2 solutions in an incremental way. We term these kinds of solutions "Hybrid v3," as they are a mix of v2 and v3 together.
|
||||
|
||||
For example, a Hybrid v3 data portal could use a new microservice written in Node for the frontend, and combine that with CKAN v2 (with v2 extensions).
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph ckanapi3["CKAN API 3"]
|
||||
ckanapi31["API 3"]
|
||||
end
|
||||
|
||||
subgraph ckanapi2["CKAN API 2"]
|
||||
ckanapi21["API 2"]
|
||||
end
|
||||
|
||||
subgraph ckanapi1["CKAN API 1"]
|
||||
ckanapi11["API 1"]
|
||||
end
|
||||
|
||||
subgraph ckanfrontend["CKAN frontend"]
|
||||
ckanfrontend1["Frontend"]
|
||||
end
|
||||
|
||||
subgraph ckanclassic["CKAN Classic"]
|
||||
ckancore["Core"] --> ckanext["CKAN Extension 1"]
|
||||
ckancore --> ckanext2["CKAN Extension 2"]
|
||||
end
|
||||
|
||||
ckanfrontend1 --> ckancore
|
||||
ckanfrontend1 --> ckanapi11
|
||||
ckanfrontend1 --> ckanapi21
|
||||
ckanfrontend1 --> ckanapi31
|
||||
```
|
||||
|
||||
The hybrid approach means we can evolve CKAN v2 "Classic" to CKAN v3 "Next Gen" incrementally. In particular, it allows people to keep using their existing v2 extensions, and upgrade them to new microservices gradually.
|
||||
|
||||
### Comparison of Approaches
|
||||
|
||||
| | CKAN v2 (Classic) | CKAN v3 (Next Gen) | CKAN v3 Hybrid |
|
||||
| ------------ | ------------------| -------------------| ---------------|
|
||||
| Architecture | Monolithic | Microservice | Microservice with v2 core |
|
||||
| Language | Python | You can write services in any language you like.<br/><br/>Frontend default: JS.<br/>Backend default: Python | Python and any language you like for microservices. |
|
||||
| Frontend (and theming) | Python with Python CKAN extension | Flexible. Default is modern JS/NodeJS based | Can use old frontend but default to new JS-based frontend. |
|
||||
| Data Packages | Add-on, no integration | Default internal and external format | Data Packages with converter to old CKAN format. |
|
||||
| Extension | Extensions are libraries that are added to core runtime. They must therefore be built in python and are loaded into the core process at build time. "Template/inheritance" model where hooks are in core and it is core that loads and calls plugins. This means that if a hook does not exist in core then the extension is stymied. | Extensions are microservices and can be written in any language. They are loaded into the url space via kubernetes routing manager. Extensions hook into "core" via APIs (rather than in code). Follows a "composition" model rather than inheritance model | Can use old style extensions or microservices. |
|
||||
| Resource Scaling | You have a single application so scaling is of the core application. | You can scale individual microservices as needed. | Mix of v2 and v3 |
|
||||
|
||||
## Why v3: Long Version
|
||||
|
||||
What are the problems with CKAN v2's monolithic architecture in relation to microservices v3?
|
||||
|
||||
* **Poor Developer Experience (DX), innovability, and scalability due to coupling**. Monolithic means "one big system" => Coupling & Complexity => hard to understand, change and extend. Changes in one area can unexpectedly affect other areas.
|
||||
* DX to develop a small new API requires wiring into CKAN core via an extension. Extensions can interact in unexpected ways.
|
||||
* The core of people who fully understand CKAN has stayed small for a reason: there's a lot of understand.
|
||||
* https://github.com/ckan/ckan/issues/5333 is an example of a small bug that's hard to track down due to various paths involved.
|
||||
* Harder to make incremental changes due to coupling (e.g. Python 3 upgrade requires *everything* to be fixed at once - can't do rolling releases).
|
||||
* **Stability**. One bad extension crashes or slows down the whole system
|
||||
* **One language => Less developer flexibility (Poor DX)**. Have to write *everything* in Python, including the frontend. This is an issue especially for the frontend: almost all modern frontend development is heavily Javascript-based and theme is the #1 thing people want to customize in CKAN. At the moment, that requires installing *all* of CKAN core (using Docker) plus some familiarity with Python and Jinja templating. This is a big ask.
|
||||
* **Extension stablity and testing**. Testing of extensions is painful (at least without careful factoring in a separate mini library) and are therefore often not tested; they don't have Continuous Integration (CI) or Continuous Deployment (CD). As an example, a highly experienced Python developer at Datopian was still struggling to get extension tests working 6 months into their CKAN work.
|
||||
* **DX is poor especially when getting started**. Getting CKAN up and running requires multiple external services (database, Solr, Redis, etc.) making Docker the only viable way for bootstraping a local development environment. This makes getting started with CKAN daunting and painful.
|
||||
* **Vertical scalability is poor**. Scaling the system is costly as you have to replicate the whole core process in every machine.
|
||||
* **System is highly coupled.** Extensions b/c in process tend to end up with significant coupling to core which makes them brittle (has improved with plugins.toolkit)
|
||||
* Upgrading core to Python 3 requires upgrading *all* extensions because they run in the same process.
|
||||
* Search Index is not a separate API, but in Core. So replacing Solr is hard.
|
||||
|
||||
The top 2 customizations of CKAN are slow and painful and require deep knowledge of CKAN:
|
||||
|
||||
* Theming a site.
|
||||
* Customizing the metadata.
|
||||
|
||||
## Architectures
|
||||
|
||||
### CKAN v2 (Classic)
|
||||
|
||||
This diagram is based on the file `docker-compose.yml` of [github.com/okfn/docker-ckan](https://github.com/okfn/docker-ckan) (`docker-compose.dev.yml` has the same components, but different configuration).
|
||||
|
||||
A difference from this diagram to the file is that we are not including DataPusher, as it is not a required dependency.
|
||||
|
||||
>[!tip]Databases may run as Docker containers, or rely on third-party services such as Amazon Relational Database Service (RDS).
|
||||
|
||||
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
CKAN[CKAN web app]
|
||||
|
||||
CKAN --> DB[(Database)]
|
||||
CKAN --> Solr[(Solr)]
|
||||
CKAN --> Redis[(Redis)]
|
||||
|
||||
subgraph Docker container
|
||||
CKAN
|
||||
end
|
||||
```
|
||||
|
||||
Same setup showing some of the key extensions explicitly:
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
core[CKAN Core] --> DB[(Database)]
|
||||
datastore --> DB2[(Database - DataStore)]
|
||||
core --> Solr[(Solr)]
|
||||
core --> Redis[(Redis)]
|
||||
|
||||
subgraph Docker container
|
||||
core
|
||||
datastore
|
||||
datapusher
|
||||
imageview
|
||||
...
|
||||
end
|
||||
```
|
||||
|
||||
CKAN ships with several core extensions that are built-in. Here, together with the list of main components, we list a couple of them:
|
||||
|
||||
Name | Type | Repository | Description
|
||||
-----|------|------------|------------
|
||||
CKAN | Application (API + Worker) | [Link](https://github.com/ckan/ckan) | Data management system (DMS) for powering data hubs and data portals. It's a monolithical web application that includes several built-in extensions and dependencies, such as a job queue service. In theory, it's possible to run it without any extensions.
|
||||
datapusher | CKAN Extension | [Link](https://github.com/ckan/ckan/tree/master/ckanext/datapusher) | It could also be called "datapusher-connect." It's a glue code to connect with a separate microservice called DataPusher, which performs actions when new data arrives.
|
||||
datastore | CKAN Extension | [Link](https://github.com/ckan/ckan/tree/master/ckanext/datastore) | The interface between CKAN and the structure database, the one receiving datasets and resources (CSVs). It includes an API for the database and an administrative UI.
|
||||
imageview | CKAN Extension | [Link](https://github.com/ckan/ckan/tree/master/ckanext/imageview) | It provides an interface for creating HTML templates for image resources.
|
||||
multilingual | CKAN Extension | [Link](https://github.com/ckan/ckan/tree/master/ckanext/multilingual) | It provides an interface for translation and localization.
|
||||
Database | Database | | People tend to use a single PostgreSQL instance for this. Separated in multiple databases, it's the place where CKAN stores its own information (sometimes referred as "MetaStore" and "HubStore"), rows of resources (StructuredStore or DataStore), and raw datasets and resources ("BlobStore" or "FileStore"). The latter may store data in the local filesystem or cloud providers, via extensions.
|
||||
Solr | Database | | It provides indexing and full-text search for CKAN.
|
||||
Redis | Database | | Lightweight key-value store, used for caching and job queues.
|
||||
|
||||
### CKAN v3 (Next Gen)
|
||||
|
||||
CKAN Next Gen is still a DMS, as CKAN Classic; but rather than a monolithical architecture, it follows the microservices approach. CKAN Classic is not a dependency anymore, as we have smaller services providing functionality that we may or many not choose to include. This description is based on [Datopian's Technical Documentation](/docs/dms/ckan-v3/next-gen/#roadmap).
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph api3["..."]
|
||||
api31["API"]
|
||||
end
|
||||
|
||||
subgraph api2["Administration"]
|
||||
api21["API"]
|
||||
end
|
||||
|
||||
subgraph api1["Authentication"]
|
||||
api11["API"]
|
||||
end
|
||||
|
||||
subgraph frontend["Frontend"]
|
||||
frontendapi["API"]
|
||||
end
|
||||
|
||||
subgraph storage["Raw Resources Storage"]
|
||||
storageapi["API"]
|
||||
end
|
||||
|
||||
storageapi --> cloudstorage[(Cloud Storage)]
|
||||
|
||||
frontendapi --> storageapi
|
||||
frontendapi --> api11
|
||||
frontendapi --> api21
|
||||
frontendapi --> api31
|
||||
```
|
||||
|
||||
At this moment, many important features are only available through CKAN extensions, so that brings us to the hybrid approach.
|
||||
|
||||
### CKAN Hybrid v3 (Next Gen)
|
||||
|
||||
We may sometimes make an explit distinction between CKAN v3 "hybrid" and "pure." The reason is because we want to ensure that we're not there yet – we have many opportunities to extract features out of CKAN and CKAN Extensions.
|
||||
|
||||
In this approach, we still rely on CKAN Classic and all its extensions. Many already had many tests and bugs fixed, so we can deliver more if not forced to rewrite everything from scratch.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph ckanapi3["CKAN API 3"]
|
||||
ckanapi31["API 3"]
|
||||
end
|
||||
|
||||
subgraph ckanapi2["CKAN API 2"]
|
||||
ckanapi21["API 2"]
|
||||
end
|
||||
|
||||
subgraph ckanapi1["CKAN API 1"]
|
||||
ckanapi11["API 1"]
|
||||
end
|
||||
|
||||
subgraph ckanfrontend["Frontend"]
|
||||
ckanfrontend1["Frontend v2"]
|
||||
theme["[Project-specific theme]"]
|
||||
end
|
||||
|
||||
subgraph ckanclassic["CKAN Classic"]
|
||||
ckancore["Core"] --> ckanext["CKAN Extension 1"]
|
||||
ckancore --> ckanext2["[Project-specific extension]"]
|
||||
end
|
||||
|
||||
ckanfrontend1 --> ckancore
|
||||
ckanfrontend1 --> ckanapi11
|
||||
ckanfrontend1 --> ckanapi21
|
||||
ckanfrontend1 --> ckanapi31
|
||||
```
|
||||
|
||||
Name | Type | Repository | Description
|
||||
-----|------|------------|------------
|
||||
Frontend v2 | Application | [Link](https://github.com/datopian/frontend-v2) | Node application for Data Portals. It communicates with a CKAN Classic instance, through its API, to get data and render HTML. It is written to be extensible, such as connecting to other applications and theming.
|
||||
[Project-specific theme] | Frontend Theme | e.g., [Link](https://github.com/datopian/frontend-oddk) | Extension to Frontend v2 where you can personalize the interface, create different pages, and connect with other APIs.
|
||||
[API 1] | Application | e.g., [Link](https://github.com/datopian/data-subscriptions) | Any application with an API to communicate with the user-facing Frontend v2 or to run tasks in background. Given the current architecture, often, this API is usually designed to work with CKAN interfaces. Over time, we may choose to make it more generic, and even replace CKAN Core with other applications.
|
||||
|
||||
## Job Stories
|
||||
|
||||
In this spreadsheet, you will find a list of common job stories in CKAN projects. Also, how you can accomplish them in CKAN v2, v3, and Hybrid v3.
|
||||
|
||||
https://docs.google.com/spreadsheets/d/1cLK8xylprmVsoQIbdphqz9-ccSpdDABQExvKdvNJqaQ/edit#gid=757361856
|
||||
|
||||
## Glossary
|
||||
|
||||
### API
|
||||
|
||||
An HTTP API, usually following the REST style.
|
||||
|
||||
### Application
|
||||
|
||||
A Python package, an API, a worker... It may have other applications as dependencies.
|
||||
|
||||
### CKAN Extension
|
||||
|
||||
A Python package following specification from [CKAN Extending guide](https://docs.ckan.org/en/2.8/extensions/index.html).
|
||||
|
||||
### Database
|
||||
|
||||
An organized collection of data.
|
||||
|
||||
### Dataset
|
||||
|
||||
A group of resources made to be distributed together.
|
||||
|
||||
### Frontend Theme
|
||||
|
||||
A Node project specializing behavior present in [Frontend v2](https://github.com/datopian/frontend-v2).
|
||||
|
||||
### Resource
|
||||
|
||||
A data blob. Common formats are CSV, JSON, and PDF.
|
||||
|
||||
### System
|
||||
|
||||
A group of applications and databases that work together to accomplish a set of tasks.
|
||||
|
||||
### Worker
|
||||
|
||||
An application that runs tasks in background. They may run recurrently according to a given schedule, or as soon as it's requested by another application.
|
||||
|
||||
## Appendix
|
||||
|
||||
### Architecture - CKAN v2 with DataPusher
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph DataPusher
|
||||
datapusherapi["DataPusher API"]
|
||||
datapusherworker["CKAN Service Provider"]
|
||||
SQLite[(SQLite)]
|
||||
end
|
||||
|
||||
subgraph CKAN
|
||||
core
|
||||
datapusher[datapusher ext]
|
||||
datastore
|
||||
...
|
||||
end
|
||||
|
||||
core[CKAN Core] --> datastore
|
||||
datastore --> DB[(Database)]
|
||||
datapusherapi --> core
|
||||
datapusher --> datapusherapi
|
||||
```
|
||||
|
||||
Name | Type | Repository | Description
|
||||
-----|------|------------|------------
|
||||
DataPusher | System | [Link](https://github.com/ckan/datapusher) | Microservice that parses data files and uploads them to the datastore.
|
||||
DataPusher API | API | [Link](https://github.com/ckan/datapusher) | HTTP API written in Flask. It is called from the built-in `datapusher` CKAN extension whenever a resource is created (and has the right type).
|
||||
CKAN Service Provider | Worker | [Link](https://github.com/ckan/ckan-service-provider) | Library for making web services that make functions available as synchronous or asynchronous jobs.
|
||||
SQLite | Database | | Unknown use. Possibly a worker dependency.
|
||||
|
||||
### Old Next Gen Page
|
||||
|
||||
Prior to this page, we had one called "Next Gen." It has intersections with this article, although it focuses more on the benefits of microservices. For the time being, the page still exists in [/ckan-v3/next-gen](/docs/dms/ckan-v3/next-gen), although it may get merged with this one in the future.
|
||||
203
site/content/docs/dms/ckan-v3/next-gen.md
Normal file
203
site/content/docs/dms/ckan-v3/next-gen.md
Normal file
@@ -0,0 +1,203 @@
|
||||
# Next Gen
|
||||
|
||||
“Next Gen” (NG) is our name for the evolution of CKAN from its current state as “CKAN Classic”.
|
||||
|
||||
Next Gen has a decoupled, microservice architecture in contrast to CKAN Classic's monolithic architecture. It is also built from the ground up on the Frictionless Data principles and specifications which provide a simple, well-defined and widely adopted set of core interfaces and tooling for managing data.
|
||||
|
||||
## Classic to Next Gen
|
||||
|
||||
CKAN classic: monolithic architecture -- everything is one big python application. Extension is done at code level and "compiled in" at compile/run-time (i.e. you end up with one big docker file).
|
||||
|
||||
CKAN Next Gen: decoupled, service-oriented -- services connected by network calls. Extension is done by adding new services,
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
subgraph "CKAN Classic"
|
||||
plugins
|
||||
end
|
||||
|
||||
subgraph "CKAN Next Gen"
|
||||
microservices
|
||||
end
|
||||
|
||||
plugins --> microservices
|
||||
```
|
||||
|
||||
You can read more about monolithic vs microservice architectures in the [Appendix below](#appendix-monolithic-vs-microservice-architecture).
|
||||
|
||||
|
||||
## Next Gen lays the foundation for the future and brings major immediate benefits
|
||||
|
||||
Next Gen's new approach is important in several major ways.
|
||||
|
||||
### Microservices are the Future
|
||||
|
||||
First, decoupled microservices have become *the* way to design and deploy (web) applications after first being pioneered by the likes of Amazon in the early 2000s. And in the last five to ten years have brought microservices "for the masses" with relevant tooling and technology standardized, open-sourced and widely deployed -- not only with containerization such as Docker, Kubernetes but also in programming languages like (server-side) Javascript and Golang.
|
||||
|
||||
By adopting a microservice approach CKAN can reap the the benefits of what is becoming a mature and standard way to design and build (web) applications. This includes the immediate advantages of being aligned with the technical paradigm such as tooling and developer familiarity.
|
||||
|
||||
### Microservices bring Scalability, Reliability, Extensibility and Flexibility
|
||||
|
||||
In addition, and even more importantly, the microservices approach brings major benefits in:
|
||||
|
||||
1. **Scalability**: dramatically easier and cheaper to scale up -- and down -- in size *and* complexity. Size-wise this is because you can replicate individual services rather than the whole application. Complexity-wise this is because monolithic architectures tend to become "big" where service-oriented encourages smaller lightweight components with cleaner interfaces. This means you can have a much smaller core making it easier to install, setup and extend. It also means you can use what you need making solutions easier to maintain and upgrade.
|
||||
2. **Reliability**: easier (and cheaper) to build highly reliable, high availability solutions because microservices make isolation and replication easier. For example, in a microservice architecture a problem in CKAN's harvester won't impact your main portal because they run in separate containers. Similarly, you can scale the harvester system separately from the web frontend.
|
||||
3. **Extensibility**: much easier to create and maintain extensions because they are a decoupled service and interfaces are leaner and cleaner.
|
||||
4. **Flexibility** aka "Bring your own tech": services can be written in any language so, for example, you can write your frontend in javascript and your backend in Python. In a monolithic architecture all parts must be written in the same language because everything is compiled together. This flexibility makes it easier to use the best tool for the job. It also makes it much easier for teams to collaborate and cooperate and fewer bottlenecks in development.
|
||||
|
||||
ASIDE: decoupled microservices reflect the "unix" way of building networked applications. As with the "unix way" in general, whilst this approach better -- and simpler -- in the long-run, in the short-run it often needs sustantial foundational work (those Unix authors were legends!). It may also be, at least initially, more resource intensive and more complex infrastructurally. Thus, whilst this approach is "better" it was not suprising that it was initially used for for complex and/or high end applications e.g. Amazon. This also explains why it took a while for this approach to get adoption -- it is only in the last few year that we have robust, lightweight, easy to use tooling and patterns for microservices -- "microservices for the masses" if you like.
|
||||
|
||||
In summary, the Next Gen approach provides an essential foundation for the continuing growth and evolution of CKAN as a platform for building world-class data portal and data management solutions.
|
||||
|
||||
## Evolution not Revolution: Next Gen Components Work with CKAN Classic
|
||||
|
||||
*Gradual evolution from CKAN classic (keep what is working, keep your investments, incremental change)*
|
||||
|
||||
Next Gen components are specifically designed to work with CKAN "Classic" in its current form. This means existing CKAN users can immediately benefit from Next Gen components and features whilst retaining the value of their existing investment. New (or existing) CKAN-based solutions can adopt a "hybrid" approach using components from both Classic and Next Gen. It also means that the owner of a CKAN-based solution can incrementally evolve from "Classic" to "Next Gen" by replacing one component one at a time, gaining new functionality without sacrificing existing work.
|
||||
|
||||
ASIDE: we're fortunate that CKAN Classic itself was ahead of its time in its level of "service-orientation". From the start, it had a very rich and robust API and it has continued to develop this with almost almost all functionality exposed via the API. It is this rich API and well factored design that makes it relatively straightforward to evolve CKAN in its current "Classic" form towards Next Gen.
|
||||
|
||||
## New Features plus Existing Functionality Improved
|
||||
|
||||
In addition to its architecture, Next Gen provides a variety of improvements and extensions to CKAN Classic's functionality. For example:
|
||||
|
||||
* Theming and Frontend Customization: theming and customizing CKAN's frontend has got radically easier and quicker. See [Frontend section »][frontend]
|
||||
* DMS + CMS unified: integrate the full power of a modern CMS into your data portal and have one unified interface for data and content. See [Frontend section »][frontend]
|
||||
* Data Explorer: the existing CKAN data preview/explorer has been completely rewritten in modern React-based Javascript (ReclineJS is now 7y old!). See [Data Explorer section »][explorer]
|
||||
* Dashboards: build rich data-driven dashboards and integrate. See [Dashboards section »][dashboards]
|
||||
* Harvesting: simpler, more powerful harvesting built on modern ETL. See [Harvesting section »][harvesting]
|
||||
|
||||
And each of these features is easily deployed into an existing CKAN solution!
|
||||
|
||||
[frontend]: /docs/dms/frontend
|
||||
[explorer]: /docs/dms/data-explorer
|
||||
[dashboards]: /docs/dms/dashboards
|
||||
[harvesting]: /docs/dms/harvesting
|
||||
|
||||
## Roadmap
|
||||
|
||||
The journey to Next Gen from Classic can proceed step by step -- it does not need to be a big bang. Like refurbishing and extending a house, we can add a room here or renovate a room there whilst continuing to live happily in the building (and benefitting as our new bathroom comes online, or we get a new conservatory!).
|
||||
|
||||
Here's an overview of the journey to Next Gen and current implementation status. More granular information on particular features may sometimes be found on the individual feature page, for example for [Harvesting here](/docs/dms/harvesting#design).
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
start[Start]
|
||||
themefe[Read Frontend]
|
||||
authfe[Authentication in FE]
|
||||
authzfe[Authorization in FE]
|
||||
previews[Previews]
|
||||
explorer[Explorer]
|
||||
permsserv[Permissions Service]
|
||||
orgs[Organizations]
|
||||
|
||||
|
||||
subgraph Start
|
||||
start
|
||||
end
|
||||
|
||||
subgraph Frontend
|
||||
start --> themefe
|
||||
themefe --> authfe
|
||||
authfe --> authzfe
|
||||
themefe --> revisioningfe[Revision UI]
|
||||
end
|
||||
|
||||
subgraph Harvesting
|
||||
start --> harvestetl[Harvesting ETL + Runner]
|
||||
harvestetl --> harvestui[Harvest UI]
|
||||
end
|
||||
|
||||
subgraph "Admin UI"
|
||||
managedataset[Manage Dataset]
|
||||
manageorg[Manage Organization]
|
||||
manageuser[Manage Users]
|
||||
manageconfig[Manage Config]
|
||||
|
||||
start --> managedataset
|
||||
start --> manageorg
|
||||
managedataset --> manageconfig
|
||||
end
|
||||
|
||||
subgraph "Backend (API)"
|
||||
start --> permsserv
|
||||
start --> revision[Backend Revisioning]
|
||||
end
|
||||
|
||||
datastore[DataStore]
|
||||
|
||||
subgraph DataStore
|
||||
start --> datastore
|
||||
datastore --> dataload[Data Load]
|
||||
end
|
||||
|
||||
subgraph Explorer
|
||||
themefe --> previews
|
||||
previews --> explorer
|
||||
end
|
||||
|
||||
subgraph Organizations
|
||||
start --> orgs
|
||||
end
|
||||
|
||||
subgraph Key
|
||||
done[Done]
|
||||
nearlydone[Nearly Done]
|
||||
inprogress[In Progress]
|
||||
next[Next Up]
|
||||
end
|
||||
|
||||
classDef done fill:#21bf73,stroke:#333,stroke-width:3px;
|
||||
classDef nearlydone fill:lightgreen,stroke:#333,stroke-width:3px;
|
||||
classDef inprogress fill:orange,stroke:#333,stroke-width:2px;
|
||||
classDef next fill:pink,stroke:#333,stroke-width:1px;
|
||||
|
||||
class done,themefe,previews,explorer,harvestetl done;
|
||||
class nearlydone,authfe,harvestui nearlydone;
|
||||
class inprogress,dataload inprogress;
|
||||
class next,permsserv next;
|
||||
```
|
||||
|
||||
## Appendix: Monolithic vs Microservice architecture
|
||||
|
||||
Monolithic: Libraries or modules communicate via function calls (inside one big application)
|
||||
|
||||
Microservices: Services communicate over a network
|
||||
|
||||
The best introduction and definition of microservices comes from Martin Fowler https://martinfowler.com/microservices/
|
||||
|
||||
> Microservice architectures will use libraries, but their primary way of componentizing their own software is by breaking down into services. We define libraries as components that are linked into a program and called using in-memory function calls, while services are out-of-process components who communicate with a mechanism such as a web service request, or remote procedure call. https://martinfowler.com/articles/microservices.html
|
||||
|
||||
### Monolithic
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
subgraph "Monolithic - all inside"
|
||||
a
|
||||
b
|
||||
c
|
||||
end
|
||||
|
||||
a --in-memory function all--> b
|
||||
a --in-memory function all--> c
|
||||
```
|
||||
|
||||
### Microservice
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
subgraph "A Container"
|
||||
a
|
||||
end
|
||||
subgraph "B Container"
|
||||
b
|
||||
end
|
||||
subgraph "C Container"
|
||||
c
|
||||
end
|
||||
a -.network call.-> b
|
||||
a -.network call.-> c
|
||||
```
|
||||
23
site/content/docs/dms/ckan.md
Normal file
23
site/content/docs/dms/ckan.md
Normal file
@@ -0,0 +1,23 @@
|
||||
---
|
||||
sidebar: auto
|
||||
---
|
||||
|
||||
# CKAN Classic
|
||||
|
||||
CKAN (Classic) already has great documentation at: https://docs.ckan.org/
|
||||
|
||||
This material is a complement to those docs as well as details of our particular setup. Here, among other things, you'll learn how to:
|
||||
|
||||
* [Get Started with CKAN for Development -- install and run CKAN on your local machine](/docs/dms/ckan/getting-started)
|
||||
* [Play around with a CKAN instance including importing and visualising data](/docs/dms/ckan/play-around)
|
||||
* [Install Extensions](/docs/dms/ckan/install-extension)
|
||||
* [Create Your Own Extension](/docs/dms/ckan/create-extension)
|
||||
* [Client Guide](/docs/dms/ckan-client-guide)
|
||||
* [FAQ](/docs/dms/ckan/faq)
|
||||
|
||||
[start]: /docs/dms/ckan/getting-started
|
||||
[play]: /docs/dms/ckan/play-around
|
||||
|
||||
[CKAN]: https://ckan.org/
|
||||
[docs]: https://docs.ckan.org/
|
||||
|
||||
162
site/content/docs/dms/ckan/create-extension.md
Normal file
162
site/content/docs/dms/ckan/create-extension.md
Normal file
@@ -0,0 +1,162 @@
|
||||
---
|
||||
sidebar: auto
|
||||
---
|
||||
|
||||
# Introduction
|
||||
A CKAN extension is a Python package that modifies or extends CKAN. Each extension contains one or more plugins that must be added to your CKAN config file to activate the extension’s features.
|
||||
|
||||
## Creating and Installing extensions
|
||||
1. Boot up your docker compose
|
||||
```
|
||||
docker-compose -f docker-compose.dev.yml up
|
||||
```
|
||||
|
||||
|
||||
2. To create an extension template using this docker composition execute:
|
||||
|
||||
```
|
||||
docker-compose -f docker-compose.dev.yml exec ckan-dev /bin/bash -c "paster --plugin=ckan create -t ckanext ckanext-example_extension -o /srv/app/src_extensions"
|
||||
```
|
||||
|
||||
This command will create an extension template in your local `./src` folder that is mounted inside the containers in the `/srv/app/src_extension` directory. Any extension cloned on the `src` folder will be installed in the CKAN container when booting up Docker Compose (`docker-compose up`). This includes installing any requirements listed in a `requirements.txt` (or `pip-requirements.txt`) file and running `python setup.py develop`.
|
||||
|
||||
|
||||
3. Add the plugin to the `CKAN__PLUGINS` setting in your `.env` file.
|
||||
|
||||
```
|
||||
CKAN__PLUGINS=stats text_view recline_view example_extension
|
||||
```
|
||||
|
||||
|
||||
4. Restart your docker-compose:
|
||||
|
||||
```
|
||||
# Shut down your instance with crtl+c and then run it again with:
|
||||
docker-compose -f docker-compose.dev.yml up
|
||||
```
|
||||
> [!tip]CKAN will be started running on the paster development server with the '--reload' option to watch changes in the extension files.
|
||||
|
||||
You should see the following output in the console:
|
||||
|
||||
```
|
||||
...
|
||||
ckan-dev_1 | Installed /srv/app/src_extensions/ckanext-example_extension
|
||||
...
|
||||
```
|
||||
|
||||
## Edit the extension
|
||||
|
||||
Let's edit a template to change the way CKAN is displayed to the user!
|
||||
|
||||
1. First you will need write permissions to the extension folder since it was created by the user running docker. Replace `your_username` and execute the following command:
|
||||
|
||||
> [!tip]You can find out your current username by typing 'echo $USER' in the terminal.
|
||||
|
||||
```
|
||||
sudo chown -R <your_username>:<your_username> src/ckanext-example_extension
|
||||
```
|
||||
|
||||
2. The previous comamand creates all the files and folder structure needed for our extension. Open `src/ckanext-example_extension/ckanext/example_extension/plugin.py` to see the main file of our extension that we will edit to add custom functionality:
|
||||
|
||||
```python
|
||||
import ckan.plugins as plugins
|
||||
import ckan.plugins.toolkit as toolkit
|
||||
|
||||
|
||||
class Example_ExtensionPlugin(plugins.SingletonPlugin):
|
||||
plugins.implements(plugins.IConfigurer)
|
||||
|
||||
# IConfigurer
|
||||
|
||||
def update_config(self, config_):
|
||||
toolkit.add_template_directory(config_, 'templates')
|
||||
toolkit.add_public_directory(config_, 'public')
|
||||
toolkit.add_resource('fanstatic', 'example_theme')
|
||||
```
|
||||
|
||||
3. We will create a custom Flask Blueprint to extend our CKAN instance with more endpoints. In order to create a new blueprint and add an endpoint we need to:
|
||||
- Import Blueprint and render_template from the flask module.
|
||||
- Create the functions that will be used as endpoints
|
||||
- Implement the IBlueprint interface in our plugin and add the new endpoint.
|
||||
|
||||
4. From flask import Blueprint and render_template,
|
||||
|
||||
```python
|
||||
import ckan.plugins as plugins
|
||||
import ckan.plugins.toolkit as toolkit
|
||||
|
||||
from flask import Blueprint, render_template
|
||||
|
||||
class Example_ExtensionPlugin(plugins.SingletonPlugin):
|
||||
plugins.implements(plugins.IConfigurer)
|
||||
|
||||
# IConfigurer
|
||||
|
||||
def update_config(self, config_):
|
||||
toolkit.add_template_directory(config_, 'templates')
|
||||
toolkit.add_public_directory(config_, 'public')
|
||||
toolkit.add_resource('fanstatic', 'example_extension')
|
||||
```
|
||||
|
||||
5. Create a new function: hello_plugin
|
||||
```python
|
||||
import ckan.plugins as plugins
|
||||
import ckan.plugins.toolkit as toolkit
|
||||
|
||||
from flask import Blueprint, render_template
|
||||
|
||||
def hello_plugin():
|
||||
u'''A simple view function'''
|
||||
return u'Hello World, this is served from an extension'
|
||||
|
||||
class Example_ExtensionPlugin(plugins.SingletonPlugin):
|
||||
plugins.implements(plugins.IConfigurer)
|
||||
|
||||
# IConfigurer
|
||||
|
||||
def update_config(self, config_):
|
||||
toolkit.add_template_directory(config_, 'templates')
|
||||
toolkit.add_public_directory(config_, 'public')
|
||||
toolkit.add_resource('fanstatic', 'example_extension')
|
||||
```
|
||||
6. Implement the IBlueprint interface in our plugin and add the new endpoint.
|
||||
|
||||
```python
|
||||
import ckan.plugins as plugins
|
||||
import ckan.plugins.toolkit as toolkit
|
||||
|
||||
from flask import Blueprint, render_template
|
||||
|
||||
def hello_plugin():
|
||||
u'''A simple view function'''
|
||||
return u'Hello World, this is served from an extension'
|
||||
|
||||
class Example_ExtensionPlugin(plugins.SingletonPlugin):
|
||||
plugins.implements(plugins.IConfigurer)
|
||||
plugins.implements(plugins.IBlueprint)
|
||||
|
||||
# IConfigurer
|
||||
|
||||
def update_config(self, config_):
|
||||
toolkit.add_template_directory(config_, 'templates')
|
||||
toolkit.add_public_directory(config_, 'public')
|
||||
toolkit.add_resource('fanstatic', 'example_extension')
|
||||
|
||||
# IBlueprint
|
||||
|
||||
def get_blueprint(self):
|
||||
u'''Return a Flask Blueprint object to be registered by the app.'''
|
||||
# Create Blueprint for plugin
|
||||
blueprint = Blueprint(self.name, self.__module__)
|
||||
blueprint.template_folder = u'templates'
|
||||
# Add plugin url rules to Blueprint object
|
||||
blueprint.add_url_rule('/hello_plugin', '/hello_plugin', hello_plugin)
|
||||
return blueprint
|
||||
|
||||
```
|
||||
|
||||
6. Go back to the browser and navigate to http://ckan:5000/hello_plugin. You should see the value returned by our view!
|
||||
|
||||

|
||||
|
||||
Now that you have added a new view and endpoint to your plugin you are ready for the next step of the tutorial! You can also check the complete code of this plugin in the [ckan repository](https://github.com/ckan/ckan/tree/master/ckanext/example_flask_iblueprint).
|
||||
110
site/content/docs/dms/ckan/faq.md
Normal file
110
site/content/docs/dms/ckan/faq.md
Normal file
@@ -0,0 +1,110 @@
|
||||
---
|
||||
sidebar: auto
|
||||
---
|
||||
|
||||
# FAQ
|
||||
|
||||
This page provides answers to some frequently asked questions.
|
||||
|
||||
## How to create an extension template in my local machine
|
||||
|
||||
You can use the `paster` command in the same way as a source install. To create an extension execute the following command:
|
||||
|
||||
```
|
||||
docker-compose -f docker-compose.dev.yml exec ckan-dev /bin/bash -c "paster --plugin=ckan create -t ckanext ckanext-myext -o /srv/app/src_extensions"
|
||||
```
|
||||
|
||||
This will create an extension template inside the container's folder `/srv/app/src_extensions` which is mapped to your local `src/` folder.
|
||||
|
||||
Now you can navigate to your local folder `src/` and see the extension created by the previous command and open the project in your favorite IDE.
|
||||
|
||||
|
||||
## How to separate that extension in a new git repository so I can have the independence to install it in other instances
|
||||
|
||||
Crucial thing is to understand that extensions get their repositories on GitHub (or elsewhere). You can first create a repository for extension and later clone in `src/` or do opposite as following:
|
||||
|
||||
* Create the Extension, for example: `ckanext-myext`.
|
||||
```
|
||||
docker-compose -f docker-compose.dev.yml exec ckan-dev /bin/bash -c "paster --plugin=ckan create -t ckanext ckanext-myext -o /srv/app/src_extensions"
|
||||
```
|
||||
|
||||
* Init your new git repository into the extension folder `src/ckanext-myext`
|
||||
```
|
||||
cd src/ckanext-myext
|
||||
git init
|
||||
```
|
||||
* Configure remote/origin
|
||||
```
|
||||
git remote add origin <remote_repository_url>
|
||||
```
|
||||
* Add your files and push the first commit
|
||||
```
|
||||
git add .
|
||||
git commit -m 'Initial Commit'
|
||||
git push
|
||||
```
|
||||
|
||||
**Note:** The `src/` folder is gitignored in `okfn/docker-ckan` repository, so initializing new git repositories inside is ok.
|
||||
|
||||
## How to quickly refresh the changes in my extension into the dockerized environment so I can have quick feedback of my changes
|
||||
|
||||
This docker-compose setup for dev environment is already configured so that it sets `debug=True` inside configuration file and auto reloads on python and templates related changes. You do not have to reload when making changes to HTML, javascript or configuration files - you just need to refresh the page in the browser.
|
||||
|
||||
See the CKAN images section of the [repository documentation](https://github.com/okfn/docker-ckan#ckan-images) for more detail
|
||||
|
||||
## How to run tests for my extension in the dockerized environment so I can have a quick test-development cycle
|
||||
|
||||
We write and store unit tests inside the `ckanext/myext/tests` directory. To run unit tests you need to be running the `ckan-dev` service of this docker-compose setup.
|
||||
|
||||
* Once running, in another terminal window run the test command:
|
||||
```
|
||||
docker-compose -f docker-compose.dev.yml exec ckan-dev nosetests --ckan-dev --nologcapture --reset-db -s -v --with-pylons=/srv/app/src_extensions/ckanext-myext/test.ini /srv/app/src_extensions/ckanext-myext/
|
||||
```
|
||||
|
||||
You can also pass nosetest arguments to debug
|
||||
```
|
||||
--ipdb --ipdb-failure
|
||||
```
|
||||
|
||||
**Note:** Right now all tests will be run, it is not possible to choose a specific file or test.
|
||||
|
||||
## How to debug my methods in the dockerized environment so I can have a better understanding of whats going on with my logic
|
||||
|
||||
To run a container and be able to add a breakpoint with `pdb`, run the `ckan-dev` container with the `--service-ports` option:
|
||||
|
||||
```
|
||||
docker-compose -f docker-compose.dev.yml run --service-ports ckan-dev
|
||||
```
|
||||
|
||||
This will start a new container, displaying the standard output in your terminal. If you add a breakpoint in a source file in the `src` folder (`import pdb; pdb.set_trace()`) you will be able to inspect it in this terminal next time the code is executed.
|
||||
|
||||
## How to debug core CKAN code
|
||||
|
||||
Currently, this docker-compose setup doesn't allow us to debug core CKAN code since it lives inside the container. However, we can do some hacks so the container uses a local clone of the CKAN core hosted in our machine. To do it:
|
||||
|
||||
- Create a new folder called `ckan_src` in this `docker-ckan` folder at the same level of the `src/`
|
||||
- Clone ckan and checkout the version you want to debug/edit
|
||||
|
||||
```
|
||||
git https://github.com/ckan/ckan/ ckan_src
|
||||
cd ckan_src
|
||||
git checkout ckan-2.8.3
|
||||
```
|
||||
|
||||
- Edit `docker-compose.dev.yml` and add an entry to ckan-dev's and ckan-worker-dev's volumes. This will allow the docker container to access the CKAN code hosted in our machine.
|
||||
|
||||
```
|
||||
- ./ckan_src:/srv/app/ckan_src
|
||||
```
|
||||
|
||||
- Create a script in `ckan/docker-entrypoint.d/z_install_ckan.sh` to install CKAN inside the container from the cloned repository (instead of the one installed in the Dockerfile)
|
||||
|
||||
```
|
||||
#!/bin/bash
|
||||
echo "*********************************************"
|
||||
echo "overriding with ckan installation with ckan_src"
|
||||
pip install -e /srv/app/ckan_src
|
||||
echo "*********************************************"
|
||||
```
|
||||
|
||||
That's it. This will install CKAN inside the container in development mode, from the shared folder. Now you can open the `ckan_src/` folder from your favorite IDE and start working on CKAN.
|
||||
77
site/content/docs/dms/ckan/getting-started.md
Normal file
77
site/content/docs/dms/ckan/getting-started.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# CKAN: Getting Started for Development
|
||||
|
||||
## Prerequisites
|
||||
|
||||
CKAN has a rich tech stack so we have opted to standardize our instructions with Docker Compose, which will help you spin up every service in a few commands.
|
||||
|
||||
If you already have Docker-compose, you are ready to go!
|
||||
|
||||
If not, please, follow instructions on [how to install docker-compose](https://docs.docker.com/compose/install/).
|
||||
|
||||
On Ubuntu you can run:
|
||||
|
||||
```
|
||||
sudo apt-get update
|
||||
sudo apt-get install docker-compose
|
||||
```
|
||||
|
||||
## Cloning the repo
|
||||
|
||||
```
|
||||
git clone https://github.com/okfn/docker-ckan
|
||||
# or git clone git@github.com:okfn/docker-ckan.git
|
||||
cd docker-ckan
|
||||
```
|
||||
|
||||
## Booting CKAN
|
||||
|
||||
Create a local environment file:
|
||||
|
||||
```
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
Build and Run the instances:
|
||||
|
||||
> [!tip]'docker-compose' must be run with 'sudo'. If you want to change this, you can follow the steps below. NOTE: The 'docker' group grants privileges equivalent to the 'root' user.
|
||||
|
||||
Create the `docker` group: `sudo groupadd docker`
|
||||
|
||||
Add your user to the `docker` group: `sudo usermod -aG docker $USER`
|
||||
|
||||
Change the storage directory ownership from `root` to `ckan` by adding the commads below to the `ckan/Dockerfile.dev`
|
||||
|
||||
```
|
||||
RUN mkdir -p /var/lib/ckan/storage/uploads
|
||||
RUN chown -R ckan:ckan /var/lib/ckan/storage
|
||||
```
|
||||
|
||||
At this point, you can log out and log back in for these changes to apply. You can also use the command `newgrp docker` to temporarily enable the new group for the current terminal session.
|
||||
|
||||
```
|
||||
docker-compose -f docker-compose.dev.yml up --build
|
||||
```
|
||||
|
||||
When you see this log message:
|
||||
|
||||

|
||||
|
||||
You can navigate to `http://localhost:5000`
|
||||
|
||||

|
||||
|
||||
and log in with the credentials that docker-compose setup created for you [user: `ckan_admin` password:`test1234`].
|
||||
|
||||
>[!tip]To learn key concepts about CKAN, including what it is and how it works, you can read the User Guide.
|
||||
[CKAN User Guide](https://docs.ckan.org/en/2.8/user-guide.html).
|
||||
|
||||
|
||||
## Next Steps
|
||||
|
||||
[Play around with CKAN portal](/docs/dms/ckan/play-around).
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
Login / Logout button breaks the experience:
|
||||
|
||||
- Change the URL from `http://ckan:5000` to `http://localhost:5000`. A complete fix is described in the [Play around with CKAN portal](/docs/dms/ckan/play-around). (Your next step. ;))
|
||||
76
site/content/docs/dms/ckan/install-extension.md
Normal file
76
site/content/docs/dms/ckan/install-extension.md
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
sidebar: auto
|
||||
---
|
||||
|
||||
# Installing extensions
|
||||
|
||||
A CKAN extension is a Python package that modifies or extends CKAN. Each extension contains one or more plugins that must be added to your CKAN config file to activate the extension’s features.
|
||||
|
||||
In this sections we will teach you only how to install existing extensions. See [next steps](/docs/dms/ckan/create-extension) in case you need to create or modify extensions
|
||||
|
||||
## Add new extension
|
||||
|
||||
Lets install [Hello World](https://github.com/rclark/ckanext-helloworld) on the portal. For that we need to do 2 thing:
|
||||
|
||||
1. Install extension when building docker image
|
||||
2. Add new extension to CKAN plugins
|
||||
|
||||
### Install extension on docker build
|
||||
|
||||
For this we need to modify Dockerfile for ckan service. Let's edit it:
|
||||
|
||||
```
|
||||
vi ckan/Dockerfile.dev
|
||||
|
||||
# Add following
|
||||
RUN pip install -e git+https://github.com/rclark/ckanext-helloworld.git#egg=ckanext-helloworld
|
||||
```
|
||||
|
||||
*Note:* In this example we use vi editor, but you can choose any of your choice.
|
||||
|
||||
### Add new extension to plugins
|
||||
|
||||
We need to modify .env file for that - Search for `CKAN_PLUGINS` and add new extension to the existing list:
|
||||
|
||||
```
|
||||
vi .env
|
||||
|
||||
CKAN__PLUGINS=helloworld envvars image_view text_view recline_view datastore datapusher
|
||||
```
|
||||
|
||||
## Check extension is installed
|
||||
|
||||
After modifying configuration files you will need to restart the portal. If your CKAN protal is up and running bring it down and re-start
|
||||
|
||||
```
|
||||
docker-compose -f docker-compose.dev.yml stop
|
||||
docker-compose -f docker-compose.dev.yml up --build
|
||||
```
|
||||
|
||||
### Check what extensions you already have:
|
||||
|
||||
http://ckan:5000/api/3/action/status_show
|
||||
|
||||
Response should include list of all extensions including `helloworld` in it.
|
||||
|
||||
```
|
||||
"extensions": [
|
||||
"envvars",
|
||||
"helloworld",
|
||||
"image_view",
|
||||
"text_view",
|
||||
"recline_view",
|
||||
"datastore",
|
||||
"datapusher"
|
||||
]
|
||||
```
|
||||
|
||||
### Check the extension is actually working
|
||||
|
||||
This extension simply adds new route `/hello/world/name` to the base ckan and says hello
|
||||
|
||||
http://ckan:5000/hello/world/John-Doe
|
||||
|
||||
## Next steps
|
||||
|
||||
[Create your own extension](/docs/dms/ckan/create-extension)
|
||||
285
site/content/docs/dms/ckan/play-around.md
Normal file
285
site/content/docs/dms/ckan/play-around.md
Normal file
@@ -0,0 +1,285 @@
|
||||
---
|
||||
sidebar: auto
|
||||
---
|
||||
|
||||
# How to play around with CKAN
|
||||
|
||||
In this section, we are going to show some basic functionality of CKAN focused on the API.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- We assume you've already completed the [Getting Started Guide](/docs/dms/ckan/getting-started).
|
||||
- You have a basic understanding of Key data portal concepts:
|
||||
|
||||
CKAN is a tool for making data portals to manage and publish datasets. You can read about the key concepts such as Datasets and Organizations in the User Guide -- or you can just dive in and play around!
|
||||
|
||||
https://docs.ckan.org/en/2.9/user-guide.html
|
||||
|
||||
>[!tip]
|
||||
Install a [JSON formatter plugin for Chrome](https://chrome.google.com/webstore/detail/json-formatter/bcjindcccaagfpapjjmafapmmgkkhgoa?hl=en) or browser of your choice.
|
||||
|
||||
If you are familiar with the command line tool `curl`, you can use that.
|
||||
|
||||
In this tutorial, we will be using `curl`, but for most of the commands, you can paste a link in your browser. For POST commands, you can use [Postman](https://www.getpostman.com/) or [Google Chrome Plugin](https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop).
|
||||
|
||||
|
||||
## First steps
|
||||
|
||||
>[!tip]
|
||||
By default the portal is accessible on http://localhost:5000. Let's update your `/etc/hosts` to access it on http://ckan:5000:
|
||||
|
||||
```
|
||||
vi /etc/hosts # You can use the editor of your choice
|
||||
# add following
|
||||
127.0.0.1 ckan
|
||||
```
|
||||
|
||||
|
||||
At this point, you should be able to access the portal on http://ckan:5000.
|
||||
|
||||

|
||||
|
||||
Let's add some fixtures to it. For software, a fixture is something used consistently (in this case, data for you to play around with). Run the following from your terminal (do NOT cut the previous docker process as this one depends on the already launched docker, run in another terminal):
|
||||
|
||||
```sh
|
||||
docker-compose -f docker-compose.dev.yml exec ckan-dev ckan seed basic
|
||||
```
|
||||
|
||||
Optionally you can `exec` into a running container using
|
||||
|
||||
```sh
|
||||
docker exec -it [name of container] sh
|
||||
```
|
||||
|
||||
and run the `ckan` command there
|
||||
```sh
|
||||
ckan seed basic
|
||||
```
|
||||
|
||||
You should be able to see 2 new datasets on home page:
|
||||
|
||||

|
||||
|
||||
To get more details on ckan commands please visit [CKAN Commands Reference](https://docs.ckan.org/en/2.9/maintaining/cli.html#ckan-commands-reference).
|
||||
|
||||
### Check CKAN API
|
||||
|
||||
This tutorial focuses on the CKAN API as that is central to development work and requires more guidance. We also invite you to explore the user interface which you can do directly yourself by visiting http://ckan:5000/.
|
||||
|
||||
#### Let's check the portal status
|
||||
|
||||
Go to http://ckan:5000/api/3/action/status_show.
|
||||
|
||||
You should see something like this:
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "https://ckan:5000/api/3/action/help_show?name=status_show",
|
||||
"success": true,
|
||||
"result": {
|
||||
"ckan_version": "2.9.x",
|
||||
"site_url": "https://ckan:5000",
|
||||
"site_description": "Testing",
|
||||
"site_title": "CKAN Demo",
|
||||
"error_emails_to": null,
|
||||
"locale_default": "en",
|
||||
"extensions": [
|
||||
"envvars",
|
||||
...
|
||||
"demo"
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This means everything is OK: the CKAN portal is up and running, the API is working as expected. In case you see an internal server error, please check the logs in your terminal.
|
||||
|
||||
### A Few useful API endpoints to start with
|
||||
|
||||
CKAN's Action API is a powerful, RPC-style API that exposes all of CKAN's core features to API clients. All of a CKAN website's core functionality (everything you can do with the web interface and more) can be used by external code that calls the CKAN API.
|
||||
|
||||
#### Get a list of all datasets on the portal
|
||||
|
||||
http://ckan:5000/api/3/action/package_list
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "http://ckan:5000/api/3/action/help_show?name=package_list",
|
||||
"success": true,
|
||||
"result": ["annakarenina", "warandpeace"]
|
||||
}
|
||||
```
|
||||
|
||||
#### Search for a dataset
|
||||
|
||||
http://ckan:5000/api/3/action/package_search?q=russian
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "http://ckan:5000/api/3/action/help_show?name=package_search",
|
||||
"success": true,
|
||||
"result": {
|
||||
"count": 2,
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Get dataset details
|
||||
|
||||
http://ckan:5000/api/3/action/package_show?id=annakarenina
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "http://ckan:5000/api/3/action/help_show?name=package_show",
|
||||
"success": true,
|
||||
"result": {
|
||||
"license_title": "Other (Open)",
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Search for a resource
|
||||
|
||||
http://ckan:5000/api/3/action/resource_search?query=format:plain%20text
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "http://ckan:5000/api/3/action/help_show?name=resource_search",
|
||||
"success": true,
|
||||
"result": {
|
||||
"count": 1,
|
||||
"results": [
|
||||
{
|
||||
"mimetype": null,
|
||||
...
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Get resource details
|
||||
|
||||
http://ckan:5000/api/3/action/resource_show?id=288455e8-c09c-4360-b73a-8b55378c474a
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "http://ckan:5000/api/3/action/help_show?name=resource_show",
|
||||
"success": true,
|
||||
"result": {
|
||||
"mimetype": null,
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
*Note:* These are only a few examples. You can find a full list of API actions in the [CKAN API guide](https://docs.ckan.org/en/2.9/api/#action-api-reference).
|
||||
|
||||
### Create Organizations, Datasets and Resources
|
||||
|
||||
There are 4 steps:
|
||||
|
||||
- Get an API key;
|
||||
- Create an organization;
|
||||
- Create dataset inside an organization (you can't create a dataset without a parent organization);
|
||||
- And add resources to the dataset.
|
||||
|
||||
#### Get a Sysadmin Key
|
||||
|
||||
To create your first dataset, you need an API key.
|
||||
|
||||
You can see sysadmin credentials in the file `.env`. By default, they should be
|
||||
|
||||
- Username: `ckan_admin`
|
||||
- Password: `test1234`
|
||||
|
||||
1. Navigate to http://ckan:5000/user/login and login.
|
||||
2. Click on your username (`ckan_admin`) in the upright corner.
|
||||
3. Scroll down until you see `API Key` on the left side of the screen and copy its value. It should look similar to `c7325sd4-7sj3-543a-90df-kfifsdk335`.
|
||||
|
||||
#### Create Organization
|
||||
|
||||
You can create an organization from the browser easily, but let's use [CKAN API](https://docs.ckan.org/en/2.9/api/#ckan.logic.action.create.organization_create) to do so.
|
||||
|
||||
```sh
|
||||
curl -X POST http://ckan:5000/api/3/action/organization_create -H "Authorization: 9c04a69d-79f4-4b4b-b4e1-f2ac31ed961c" -d '{
|
||||
"name": "demo-organization",
|
||||
"title": "Demo Organization",
|
||||
"description": "This is my awesome organization"
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "http://ckan:5000/api/3/action/help_show?name=organization_create",
|
||||
"success": true,
|
||||
"result": {"users": [
|
||||
{
|
||||
"email_hash":
|
||||
...
|
||||
}
|
||||
]}
|
||||
}
|
||||
```
|
||||
|
||||
#### Create Dataset
|
||||
|
||||
Now, we are ready to create our first dataset.
|
||||
|
||||
```sh
|
||||
curl -X POST http://ckan:5000/api/3/action/package_create -H "Authorization: 9c04a69d-79f4-4b4b-b4e1-f2ac31ed961c" -d '{
|
||||
"name": "my-first-dataset",
|
||||
"title": "My First Dataset",
|
||||
"description": "This is my first dataset!",
|
||||
"owner_org": "demo-organization"
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "http://ckan:5000/api/3/action/help_show?name=package_create",
|
||||
"success": true,
|
||||
"result": {
|
||||
"license_title": null,
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This will create an empty (draft) dataset.
|
||||
|
||||
#### Add a resource to it
|
||||
|
||||
```sh
|
||||
curl -X POST http://ckan:5000/api/3/action/resource_create -H "Authorization: 9c04a69d-79f4-4b4b-b4e1-f2ac31ed961c" -d '{
|
||||
"package_id": "my-first-dataset",
|
||||
"url": "https://raw.githubusercontent.com/frictionlessdata/test-data/master/files/csv/100kb.csv",
|
||||
"description": "This is the best resource ever!" ,
|
||||
"name": "brand-new-resource"
|
||||
}'
|
||||
```
|
||||
|
||||
Response:
|
||||
|
||||
```json
|
||||
{
|
||||
"help": "http://ckan:5000/api/3/action/help_show?name=resource_create",
|
||||
"success": true,
|
||||
"result": {
|
||||
"cache_last_updated": null,
|
||||
...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
That's it! Now you should be able to see your dataset on the portal at http://ckan:5000/dataset/my-first-dataset.
|
||||
|
||||
## Next steps
|
||||
|
||||
* [Install Extensions](/docs/dms/ckan/install-extension).
|
||||
81
site/content/docs/dms/cms-for-data-portals.md
Normal file
81
site/content/docs/dms/cms-for-data-portals.md
Normal file
@@ -0,0 +1,81 @@
|
||||
---
|
||||
sidebar: auto
|
||||
---
|
||||
|
||||
# Content Management System (CMS) for Data Portals
|
||||
|
||||
## Summary
|
||||
|
||||
When selecting a CMS solution for Data Portals, we always recommend using headless CMS solution as it provides full flexibility when building your system. Headless CMS means only content (no HTML, CSS, JS) is created in the CMS backend and delivered to Frontend via API.
|
||||
|
||||
> The traditional CMS approach to managing content put everything in one big bucket — content, images, HTML, CSS. This made it impossible to reuse the content because it was commingled with code. Read more - https://www.contentful.com/r/knowledgebase/what-is-headless-cms/.
|
||||
|
||||
## Features
|
||||
|
||||
Core features:
|
||||
|
||||
* Create and manage blog posts (or news), e.g., `/news/abcd`
|
||||
* Create and manage static pages, e.g., `/about`, `/privacy` etc.
|
||||
|
||||
Important features:
|
||||
|
||||
* User management, e.g., ability to manage editors so that multiple users can edit content.
|
||||
* User roles, e.g., ability to assign different roles for users so that we can have admins, editors, reviewers.
|
||||
* Draft content, e.g., useful when working on content development for review/feedback loop. However, this is not essential if you have multiple environments.
|
||||
* A syntax for writing content with text formatting, multi-level headings, links, images, videos, bullet points. For example, markdown.
|
||||
* User-friendly interface (text editor) to write content.
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
CMS -.-> Blog["Blog or news section"]
|
||||
CMS -.-> IndBlog["Individual blog post"]
|
||||
CMS -.-> About["About page content"]
|
||||
CMS -.-> TC["Terms and conditions page content"]
|
||||
CMS -.-> Privacy["Privacy policy"]
|
||||
CMS -.-> Other["Other static pages"]
|
||||
```
|
||||
|
||||
## Options
|
||||
|
||||
Headless CMS options:
|
||||
|
||||
* WordPress (headless option)
|
||||
* Drupal (headless option)
|
||||
* TinaCMS - https://tina.io/
|
||||
* Git-based CMS - custom soltion based on Git repository.
|
||||
* Strapi - https://docs.strapi.io/developer-docs/latest/getting-started/introduction.html
|
||||
* Ghost - https://ghost.org/docs/
|
||||
* CKAN Pages (built-in CMS option) - https://github.com/ckan/ckanext-pages
|
||||
|
||||
*Note, there are loads of CMS available both in open-source and proprietary software. We are only considering few of them in this article and our requirement is that we should be able to fetch content via API (headless CMS). Readers are welcome to add more options into the list.*
|
||||
|
||||
Comparison criteria:
|
||||
|
||||
* Self-hosting (note this isn't criteria for most of projects and using managed hosting is a better option sometimes)
|
||||
* Free and open source
|
||||
* Multi language posts (unnecessary if your portal is single language)
|
||||
|
||||
Comparison:
|
||||
|
||||
| Options | Hosting | Free | Multi language |
|
||||
| -------- | -------- | -------- | -------------- |
|
||||
| Drupal | Tedious | Yes | Not straigtforward|
|
||||
| WordPress| Tedious | Yes | Terrible UX |
|
||||
| TinaCMS | Medium | Yes | Limited |
|
||||
| Git-based| Easy | Yes | Custom |
|
||||
| Strapi | Medium | Yes | Simple |
|
||||
| Ghost | Medium | Yes | Simple |
|
||||
| CKAN Pages| Easy | Yes | ? |
|
||||
|
||||
|
||||
## Conclusion and recommendation
|
||||
|
||||
Final decision should be made based on the following items:
|
||||
|
||||
* How often editors will create content? E.g., daily, weekly, monthly, occasionally.
|
||||
* How much content you already have and need to migrate?
|
||||
* How many content editors you are planning to have? What are their technical expertise?
|
||||
* Is there any specific requirements, e.g., you must host in your cloud?
|
||||
|
||||
By default, we would recommend considering options such as Strapi, TinaCMS and Git-based CMS. We can even start with simple CKAN's built-in Pages and only move to sophisticated CMS once it is required.
|
||||
163
site/content/docs/dms/dashboards.md
Normal file
163
site/content/docs/dms/dashboards.md
Normal file
@@ -0,0 +1,163 @@
|
||||
# Dashboards
|
||||
|
||||
## What you can do?
|
||||
|
||||
* Describe vizualizations in JSON and create interactive widgets
|
||||
* Customize dashboard layout using well-known HTML
|
||||
* Style dashboard design with TailwindCSS utility classes
|
||||
* Rapidly create basic charts using "simple" graphing specification
|
||||
* Create advanced widgets by utilizing "vega" visualization grammar
|
||||
|
||||
## How?
|
||||
|
||||
To create a dashboard you need to have some basic knowledge of:
|
||||
|
||||
* git
|
||||
* JSON
|
||||
* HTML
|
||||
|
||||
Before proceeding further, make sure you have forked the dashboards repository - https://github.com/datopian/dashboards.
|
||||
|
||||
### Create a directory for your dashboard
|
||||
|
||||
In the root of the project, create a directory for your dashboard. Name of this directory is the name of your dashboard so make it short and meaningful. Here is some good examples:
|
||||
|
||||
* population
|
||||
* environment
|
||||
* housing
|
||||
|
||||
So that your dashboard will be available at https://domain.com/dashboards/your-dashboard-name.
|
||||
|
||||
Note that your dashboard directory will contain 2 files:
|
||||
|
||||
* `index.html` - [HTML template](#Set-up-your-layout)
|
||||
* `config.json` - [configurations for widgets](#Configure-vizualizations)
|
||||
|
||||
### Set up your layout
|
||||
|
||||
You need to prepare HTML template for your dashboard. No need to create entire HTML page but only snippet that is needed to inject the widgets:
|
||||
|
||||
```html
|
||||
<h1>My example dashboard</h1>
|
||||
<div id="widget1"></div>
|
||||
<div id="widget2"></div>
|
||||
```
|
||||
|
||||
In the example above, we've created 2 div elements that we can reference by id when configuring vizualizations.
|
||||
|
||||
Note that you can add any HTML tags and make your layout stand out. In the next section we'll explain how you do some stylings.
|
||||
|
||||
### Style it
|
||||
|
||||
This step is optional but if you have a dashboard with lots of widgets and metadata, you might want to style it so it appears nicely:
|
||||
|
||||
* Use TailwindCSS utility classes **(recommended)**
|
||||
* Official docs - https://tailwindcss.com/
|
||||
* Cheat sheet - https://nerdcave.com/tailwind-cheat-sheet
|
||||
* Add inline CSS
|
||||
|
||||
Example of using TailwindCSS utility classes:
|
||||
|
||||
```html
|
||||
<h1 class="text-gray-700 text-lg">My example dashboard</h1>
|
||||
<div class="inline-block bg-gray-200 m-10" id="widget1"></div>
|
||||
<div class="inline-block bg-gray-200 m-10" id="widget2"></div>
|
||||
```
|
||||
|
||||
### Configure vizualizations
|
||||
|
||||
In your config file `config.json` you can describe your dashboard in the following way:
|
||||
|
||||
```json
|
||||
{
|
||||
"widgets": [],
|
||||
"datasets": []
|
||||
}
|
||||
```
|
||||
|
||||
* `widgets` - a list of objects where each object contains information about where a widget should be injected and how it should look like (see below for examples).
|
||||
* `datasets` - a list of dataset URLs.
|
||||
|
||||
Example of a minimal widget object:
|
||||
|
||||
```json
|
||||
{
|
||||
"elementId": "widget1",
|
||||
"view": {
|
||||
"resources": [
|
||||
{
|
||||
"datasetId": "",
|
||||
"name": ""
|
||||
}
|
||||
],
|
||||
"specType": "",
|
||||
"spec": {}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
where:
|
||||
|
||||
* `elementId` - is "id" of the HTML tag you want to use as a container of your widget. See [how we defined it here](#Set-up-your-layout).
|
||||
* `view` - descriptor of a vizualization (widget).
|
||||
* `resources` - a list of resources needed for a widget and required manipulations (transformations).
|
||||
* `datasetId` - the id (name) of the dataset from which the resource is extracted.
|
||||
* `name` - name of the resource.
|
||||
* `transform` - transformations required for a resource (optional). If you want to learn more about transforms:
|
||||
* Filtering data and applying formula: https://datahub.io/examples/transform-examples-on-co2-fossil-global#readme
|
||||
* Sampling: https://datahub.io/examples/example-sample-transform-on-currency-codes#readme
|
||||
* Aggregating data: https://datahub.io/examples/transform-example-gdp-uk#readme
|
||||
* `specType` - type of a widget, e.g., `simple`, `vega` or `figure`.
|
||||
* `spec` - specification for selected widget type. See below for examples.
|
||||
* `title`, `legend`, `footer` - these are optional metadata for a widget. All must be a string.
|
||||
|
||||
#### Basic charts
|
||||
|
||||
Simple graph spec is the easiest and quickest way to specify a vizualization. Using simple graph spec you can generate line and bar charts:
|
||||
|
||||
https://frictionlessdata.io/specs/views/#simple-graph-spec
|
||||
|
||||
#### Advanced vizualizations
|
||||
|
||||
Please check this instructions to create advanced graphs via Vega specification:
|
||||
|
||||
https://frictionlessdata.io/specs/views/#vega-spec
|
||||
|
||||
#### Figure widget
|
||||
|
||||
The figure widget is used to display a single value from a dataset. For example, you might want to show latest unemployment rate in your dashboard so that it indicates current status of your cities economy. See left-hand side widgets here - https://london.datahub.io/.
|
||||
|
||||
A specification for the figure widget would have the following structure:
|
||||
|
||||
```
|
||||
{
|
||||
"fieldName": "",
|
||||
"suffix": "",
|
||||
"prefix": ""
|
||||
}
|
||||
```
|
||||
|
||||
where "fieldName" attribute will be used to extract specific value from a row. The "suffix" and "prefix" attributes are optional strings that is used to surround a figure, e.g., you can prepend a percent sign to indicate the number's value.
|
||||
|
||||
Note that the first row of the data is used which means you need to transform data to show the relevant value. See this example for details - https://github.com/datopian/dashboard-js/blob/master/example/script.js#L12-L22.
|
||||
|
||||
#### Example
|
||||
|
||||
Check out carbon emission per capita dashboard as an example of creating advanced vizualizations:
|
||||
|
||||
https://github.com/datopian/dashboards/tree/master/co2-emission-by-nation
|
||||
|
||||
## Share it with the world!
|
||||
|
||||
To make your dashboard live on the data portal, you need to:
|
||||
|
||||
1. Simply create a pull request
|
||||
2. Wait until your work gets reviewed and merged into "master" branch.
|
||||
3. Implement any requested changes in your work.
|
||||
4. Done! Your dashboard is now available at https://domain.com/dashboards/your-dashboard-name
|
||||
|
||||
|
||||
## Research
|
||||
|
||||
* http://dashing.io/ - no longer maintained as of 2016
|
||||
* Replaced by https://smashing.github.io/
|
||||
358
site/content/docs/dms/dashboards/hdx-dashboards-notes.md
Normal file
358
site/content/docs/dms/dashboards/hdx-dashboards-notes.md
Normal file
@@ -0,0 +1,358 @@
|
||||
# HDX Technical Architecture for Quick Dashboards
|
||||
|
||||
Notes from analysis and discussion in 2018.
|
||||
|
||||
# Concepts
|
||||
|
||||
* Bite (View): a description of an individual chart / map / fact and its data (source)
|
||||
* bite (for Simon): title, desc, data (compiled), uniqueid, map join info
|
||||
* view (Data Package views): title, desc, data sources (on parent data package), transforms, ...
|
||||
* compiled view: title, desc, data (compiled)
|
||||
* Data source:
|
||||
* Single HXL file (Currently, Simon's approach requires that all the data is in a single table so there is always a single data source.)
|
||||
* Data Package(s)
|
||||
* Creator / Editor: creating and editing the dashboard (given the source datasets)
|
||||
* Renderer: given dashboard config render the dashboard
|
||||
|
||||
# Dashboard Creator
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
datahxl[data+hxl]
|
||||
layouts[Layout options]
|
||||
dashboard["Dashboard (config)<br/><br/>(Layout, Data Sources, Selected Bites)"]
|
||||
editor[Editor]
|
||||
bites[Bites<br /><em>potential charts, maps etc</em>]
|
||||
|
||||
datahxl --suggester--> bites
|
||||
bites --> editor
|
||||
layouts --> editor
|
||||
editor --save--> dashboard
|
||||
```
|
||||
|
||||
|
||||
## Bite generation
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
data[data with hxl] --> inferbites(("Iterate Recipes<br/>and see what<br/>matches"))
|
||||
inferbites --> possmatches[List of potential bites]
|
||||
possmatches --no map info--> done[Bite finished]
|
||||
possmatches --lat+lon--> done
|
||||
possmatches --geo info--> maplink(("Check pcodes<br/> and link<br/>map server url"))
|
||||
maplink -.-> fuzzy((Fuzzy Matcher))
|
||||
fuzzy --> done
|
||||
maplink --> done
|
||||
maplink --error--> nobite[No Bite]
|
||||
```
|
||||
|
||||
## Extending to non-HXL data
|
||||
|
||||
It is easy to extend this to non-HXL data by using base HXL types and inference e.g.
|
||||
|
||||
```
|
||||
date => #date
|
||||
geo => #geo+lon
|
||||
geo => #geo+lat
|
||||
string/category => #indicator
|
||||
```
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
data[data + syntax]
|
||||
datahxl[data+hxl]
|
||||
layouts[layout options]
|
||||
dashboard["Dashboard (config)"]
|
||||
editor[Editor]
|
||||
bites[Bites<br /><em>potential charts, maps etc</em>]
|
||||
|
||||
data --infer--> datahxl
|
||||
datahxl --suggester--> bites
|
||||
bites --> editor
|
||||
layouts --> editor
|
||||
editor --save--> dashboard
|
||||
```
|
||||
|
||||
# Dashboard Renderer
|
||||
|
||||
Rendering the dashboard involves:
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
bites[Compiled Bites/Views]
|
||||
renderer["Renderer<br/>(Layout + charting / mapping libs)"]
|
||||
data[Data]
|
||||
|
||||
subgraph Dashboard Config
|
||||
bitesconf[Bites/Views Config]
|
||||
layoutconf[Layout Config]
|
||||
end
|
||||
|
||||
bitecompiler[Bite/View Compiler]
|
||||
bitecompiler --> bites
|
||||
|
||||
bitesconf --> bitecompiler
|
||||
data --> bitecompiler
|
||||
|
||||
layoutconf --> renderer
|
||||
bites --> renderer
|
||||
|
||||
renderer --> dashboard[HTML Dashboard]
|
||||
```
|
||||
|
||||
|
||||
## Compiled View generation
|
||||
|
||||
See https://docs.datahub.io/developers/views/
|
||||
|
||||
|
||||
----
|
||||
|
||||
# Architecture Proposal
|
||||
|
||||
* data loader library
|
||||
* File: rows, fields (rows, columns)
|
||||
* type inference (?)
|
||||
* syntax: table schema infer
|
||||
* semantics (not now)
|
||||
* data transform library (include hxl support)
|
||||
* suggester library
|
||||
* renderer library
|
||||
|
||||
Interfaces / Objects
|
||||
|
||||
* File
|
||||
* (Dataset)
|
||||
* Transform
|
||||
* Algorithm / Recipe
|
||||
* Bite / View
|
||||
* Ordered Set of Bites
|
||||
* Dashboard
|
||||
|
||||
## File (and Dataset)
|
||||
|
||||
http://okfnlabs.org/blog/2018/02/15/design-pattern-for-a-core-data-library.html
|
||||
|
||||
https://github.com/datahq/data.js
|
||||
|
||||
File
|
||||
rows
|
||||
descriptor
|
||||
schema
|
||||
schema
|
||||
|
||||
## Recipe
|
||||
|
||||
```json=
|
||||
{
|
||||
'id':'chart0001',
|
||||
'type':'chart',
|
||||
'subType':'row',
|
||||
'ingredients':[{'name':'what','tags':['#activity-code-id','#sector']}],
|
||||
'criteria':['what > 4', 'what < 11'],
|
||||
'variables': ['what', 'count()'],
|
||||
'chart':'',
|
||||
'title':'Count of {1}',
|
||||
'priority': 8,
|
||||
}
|
||||
```
|
||||
|
||||
## Bite / Compiled View
|
||||
|
||||
```json=
|
||||
{
|
||||
bite: array [...data for chart...],
|
||||
id: string "...chart bite ID...",
|
||||
priority: number,
|
||||
subtype: string "...bite subtype - row, pie...",
|
||||
title: string "...title of bite...",
|
||||
type: string "...bite type...",
|
||||
uniqueID: string "...unique ID combining bite and data structure",
|
||||
}
|
||||
```
|
||||
|
||||
=>
|
||||
|
||||
|
||||
|
||||
## Dashboard
|
||||
|
||||
```json=
|
||||
{
|
||||
"title":"",
|
||||
"subtext":"",
|
||||
"filtersOn":true,
|
||||
"filters":[],
|
||||
"headlinefigures":0,
|
||||
"headlinefigurecharts":[
|
||||
],
|
||||
"grid":"grid5",
|
||||
"charts":[
|
||||
{
|
||||
"data":"https://proxy.hxlstandard.org/data.json?filter01=append&append-dataset01-01=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1FLLwP6nxERjo1xLygV7dn7DVQwQf0_5tIdzrX31HjBA%2Fedit%23gid%3D0&filter02=select&select-query02-01=%23status%3DFunctional&url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1R9zfMTk7SQB8VoEp4XK0xAWtlsQcHgEvYiswZsj9YA4%2Fedit%23gid%3D0",
|
||||
"chartID":""
|
||||
},
|
||||
{
|
||||
"data":"https://proxy.hxlstandard.org/data.json?filter01=append&append-dataset01-01=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1FLLwP6nxERjo1xLygV7dn7DVQwQf0_5tIdzrX31HjBA%2Fedit%23gid%3D0&filter02=select&select-query02-01=%23status%3DFunctional&url=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1R9zfMTk7SQB8VoEp4XK0xAWtlsQcHgEvYiswZsj9YA4%2Fedit%23gid%3D0",
|
||||
"chartID":""
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
```
|
||||
var config = {
|
||||
layout: 2x2 // in city-indicators dashboard is handcrafted in layout
|
||||
widgets: [
|
||||
{
|
||||
elementId / data-id: ...
|
||||
view: {
|
||||
metadata: { title, sources: "World Bank"}
|
||||
resources: rule for creating compiled list of resources. [ { datasetId: ..., resourceId: ..., transform: ...} ]
|
||||
specType:
|
||||
viewspec:
|
||||
}
|
||||
},
|
||||
{
|
||||
|
||||
},
|
||||
]
|
||||
datasets: [
|
||||
list of data package urls ...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Simon's example
|
||||
|
||||
https://simonbjohnson.github.io/hdx-iom-dtm/
|
||||
|
||||
```javascript=
|
||||
{
|
||||
// metadata for dashboard
|
||||
"title":"IOM DTM Example",
|
||||
"subtext":" ....",
|
||||
"headlinefigures": 3,
|
||||
"grid": "grid5", // user chosen layout for dashboard. Choice of 10 grids
|
||||
"headlinefigurecharts": [ //widgets - headline widget
|
||||
{
|
||||
"data": "https://beta.proxy.hxlstandard.org/data/1d0a79/download/africa-dtm-baseline-assessments-topline.csv",
|
||||
"chartID": "text0013/#country+name/1" // bite Id
|
||||
// elementId: ... // implicit from order in grid ...
|
||||
},
|
||||
{
|
||||
"data": "https://beta.proxy.hxlstandard.org/data/1d0a79/download/africa-dtm-baseline-assessments-topline.csv",
|
||||
"chartID": "text0012/#affected+hh+idps/5"
|
||||
},
|
||||
{
|
||||
"data": "https://beta.proxy.hxlstandard.org/data/1d0a79/download/africa-dtm-baseline-assessments-topline.csv",
|
||||
"chartID":"text0012/#affected+idps+ind/6"
|
||||
}
|
||||
],
|
||||
"charts": [ // chart widgets
|
||||
{
|
||||
"data": "https://beta.proxy.hxlstandard.org/data/1d0a79/download/africa-dtm-baseline-assessments-topline.csv",
|
||||
"chartID": "map0002/#adm1+code/4/#affected+idps+ind/6",
|
||||
"scale":"log" // chart config ...
|
||||
},
|
||||
{
|
||||
"data": "https://beta.proxy.hxlstandard.org/data/1d0a79/download/africa-dtm-baseline-assessments-topline.csv",
|
||||
"chartID": "chart0009/#country+name/1/#affected+idps+ind/6",
|
||||
"sort":"descending"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Algorithm
|
||||
|
||||
1. Extract the data references to a common list of datasets and fetch them
|
||||
2. You generate compiled data via hxl.js plus own code transforming to final data for charting etc
|
||||
|
||||
```
|
||||
function transformChart(rawSourceData (csv parsed), bite) => [ [ ...], [...]] - data for chart
|
||||
|
||||
hxl.js
|
||||
custom code
|
||||
|
||||
function transformMap
|
||||
|
||||
function transformText ...
|
||||
```
|
||||
|
||||
|
||||
https://github.com/SimonbJohnson/hxlbites.js
|
||||
|
||||
https://github.com/SimonbJohnson/hxlbites.js/blob/master/hxlBites.js#L957
|
||||
|
||||
```
|
||||
hb.reverse(bite) => compiled bite (see above) (data, chartConfig)
|
||||
```
|
||||
|
||||
3. generate dashboard html and compute element ids in actual page element ids computed from grid setup
|
||||
4. Now have a final dashboard config
|
||||
|
||||
|
||||
```
|
||||
widgets: [
|
||||
{
|
||||
data: [ [...], [...]]
|
||||
widgetType: text, chart, map ...
|
||||
elementId: // element to bind to ...
|
||||
}
|
||||
]
|
||||
```
|
||||
5. Now use specific renderer libraries e.g. leaflet, plotly/chartist etc to render out into page
|
||||
|
||||
https://github.com/SimonbJohnson/hxldash/blob/master/js/site.js#L294
|
||||
|
||||
### Notes
|
||||
|
||||
"Source" version of dashboard with data uncompiled.
|
||||
|
||||
Compiled version of dashboard with final data inline ...
|
||||
|
||||
hxl.js takes an array of arrays ... and outputs array of arrays ...
|
||||
|
||||
```
|
||||
{
|
||||
schema: [...]
|
||||
data: [...]
|
||||
}
|
||||
```
|
||||
|
||||
# Renderer
|
||||
|
||||
* Renderer for the dashboard
|
||||
* Renderer for each widget
|
||||
|
||||
|
||||
```
|
||||
function createChart(bite, elementId) => svg in bite
|
||||
```
|
||||
|
||||
## Charts
|
||||
|
||||
* Data Package View => svg/png etc
|
||||
* plotly
|
||||
* vega (d3)
|
||||
* https://github.com/frictionlessdata/datapackage-render-js
|
||||
* chartist
|
||||
* react-charts
|
||||
|
||||
## Map
|
||||
|
||||
* Leaflet
|
||||
* react-leaflet
|
||||
|
||||
## Tables
|
||||
|
||||
...
|
||||
|
||||
|
||||
|
||||
|
||||
270
site/content/docs/dms/data-api.md
Normal file
270
site/content/docs/dms/data-api.md
Normal file
@@ -0,0 +1,270 @@
|
||||
# Data APIs (and the DataStore)
|
||||
|
||||
## Introduction
|
||||
|
||||
A Data API provides *API* access to data stored in a [DMS][]. APIs provide granular, per record access to datasets and their component data files. They offer rich querying functionality to select the records you want, and, potentially, other functionality such as aggregation. Data APIs can also provide write access, though this has traditionally been rarer.[^rarer]
|
||||
|
||||
Furthermore, much of the richer functionality of a DMS or Data Portal such as data visualization and exploration require API data access rather than bulk download.
|
||||
|
||||
[DMS]: /docs/dms/dms
|
||||
|
||||
[^rarer]: It is rarer because write access usually means a) the data for this dataset is a structured database rather than a data file (which is normally more expensive both in terms b) the Data Portal has now become the primary (or semi-primary) home of this dataset rather simply being the host of a dataset whose home and maintenance is elsewhere.
|
||||
|
||||
### API vs Bulk Access
|
||||
|
||||
Direct download of a whole data file is the default method of access for data in a DMS. API access complements this direct download in "bulk" approach. In some situations API access may be the primary access option (so-called "API first"). In other cases, structured storage and API read/write may be the *only* way the data is stored and there is no bulk storage -- for example, this would be a natural approach for time series data which is being rapidly updated e.g. every minute.
|
||||
|
||||
*Fig 1: Contrasting Download and API based access*
|
||||
|
||||
```bash
|
||||
# simple direct file access. You download
|
||||
https://my-data-portal.org/my-dataset/my-csv-file.csv
|
||||
|
||||
# API based access. Find the first 5 records with 'awesome'
|
||||
https://my-data-portal.org/data-api/my-dataset/my-csv-file-identifier?q=awesome&limit=5
|
||||
```
|
||||
|
||||
In addition, to differing volume of access, APIs often differ from bulk download in their data format: following web conventions data APIs usually return the data in a standard format such as JSON (and can also provide various other formats e.g. XML). By contrast, direct data access necessarily supplies the data in whatever data format it was created in.
|
||||
|
||||
### Limitations of APIs
|
||||
|
||||
Whilst Data APIs are in many ways more flexible than direct download they have disadvantages:
|
||||
|
||||
* APIs are much more costly and complex to create and maintain than direct download
|
||||
* API queries are slow and limited in size because they run in real-time in memory. Thus, for bulk access e.g. of the entire dataset direct download is much faster and more efficient (download a 1GB CSV directly is easy and takes seconds but attempting to do so via the API may crash the server and be very slow).
|
||||
|
||||
{/*
|
||||
TODO: do more to compare and contrast download vs API access (e.g. what each is good for, formats, etc)
|
||||
*/}
|
||||
|
||||
|
||||
### Why Data APIs?
|
||||
|
||||
Data APIs underpin the following valuable functionality on the "read" side:
|
||||
|
||||
* **Data (pre)viewing**: reliably and richly (e.g. with querying, mapping etc). This makes the data much more accessible to non-technical users.
|
||||
* **Visualization and analytics**: rich visualization and analytics may need a data API (because they need easily to query and aggregate parts of dataset).
|
||||
* **Rich Data Exploration**: when exploring the data you will want to explore through a dataset quickly only pulling parts of the data and drilling down further as needed.
|
||||
* **(Thin) Client applications**: with a data API third party users of the portal can build apps on top of the portal data easily and quickly (and without having to host the data themselves)
|
||||
|
||||
Corresponding job stories would be like:
|
||||
|
||||
* When building a visualization I want to select only some part of a dataset that I need for my visualization so that I can load the data quickly and efficiently.
|
||||
* When building a Data Explorer or Data Driven app I want to slice/dice/aggregate my data (without downloading it myself) so that I can display that in my explorer / app.
|
||||
|
||||
On the write side they provide support for:
|
||||
|
||||
* **Rapidly updating data e.g. timeseries**: if you are updating a dataset every minute or every second you want an append operation and don't want to store the whole file every update just to add a single record
|
||||
* **Datasets stored as structured data by default** and which can therefore be updated in part, a few records at a time, rather than all at once (as with blob storage)
|
||||
|
||||
|
||||
## Domain Model
|
||||
|
||||
The functionality associated to the Data APIs can be divided in 6 areas:
|
||||
|
||||
* **Descriptor**: metadata describing and specifying the API e.g. general metadata e.g. name, title, description, schema, and permissions
|
||||
* **Manager** for creating and editing APIs.
|
||||
* API: for creating and editing Data API's descriptors (which triggers creation of storage and service endpoint)
|
||||
* UI: for doing this manually
|
||||
* **Service** (read): web API for accessing structured data (i.e. per record) with querying etc. *When we simply say "Data API" this is usually what we are talking about*
|
||||
* Custom API & Complex functions: e.g. aggregations, join
|
||||
* Tracking & Analytics: rate-limiting etc
|
||||
* Write API: usually secondary because of its limited performance vs bulk loading
|
||||
* Bulk export of query results especially large ones (or even export of the whole dataset in the case where the data is stored directly in the DataStore rather than the FileStore). This is an increasingly important featurea lower priority but if required it is substantive feature to implement.
|
||||
* **Data Loader**: bulk loading data into the system that powers the data API. **This is covered in a [separate Data Load page](/docs/dms/load/).**
|
||||
* Bulk Load: bulk import of individual data files
|
||||
* Maybe includes some ETL => this takes us more into data factory
|
||||
* **Storage (Structured)**: the underlying structured store for the data (and its layout). For example, Postgres and its table structure.This could be considered a separate component that the Data API uses or as part of the Data API -- in some cases the store and API are completely wrapped together, e.g. ElasticSearch is both a store and a rich Web API.
|
||||
|
||||
>[!tip]Visualization is not part of the API but the demands of visualization are important in designing the system.
|
||||
|
||||
## Job Stories
|
||||
|
||||
### Read API
|
||||
|
||||
When I'm building a client application or extracting data I want to get data quickly and reliably via an API so that I can focus on building the app rather than manging the data
|
||||
|
||||
* Performance: Querying data is **quick**
|
||||
* Filtering: I want to filter data easily so that I can get the slice of data that I need.
|
||||
* ❗ unlimited query size for downloading eg, can download filtered data with millions of rows
|
||||
* can get results in 3 formats: CSV, JSON and Excel.
|
||||
* API formats
|
||||
* "Restful" API (?)
|
||||
* SQL API (?)
|
||||
* ❗ GraphQL API (?)
|
||||
* ❗ custom views/cubes (including pivoting)
|
||||
* Query UI
|
||||
|
||||
:exclamation: = something not present atm
|
||||
|
||||
#### Retrieve records via an API with filtering (per resource) (if tabular?)
|
||||
|
||||
When I am building a web app, a rich viz, display the data, etc I want to have an API to data (returns e.g. JSON, CSV) [in a resource] so that I can get precise chunks of data to use without having to download and store the whole dataset myself
|
||||
|
||||
* I want examples
|
||||
* I want a playground interface …
|
||||
|
||||
#### Bulk Export
|
||||
|
||||
When I have a query with a large amount of results I want to be able to download all of those results so that I can analyse them with my own tools
|
||||
|
||||
#### Multiple Formats
|
||||
|
||||
When querying data via the API I want to be able to get the results in different formats (e.g. JSON, CSV, XML (?), ...) so that I can get it in a format most suitable for my client application or tool
|
||||
|
||||
#### Aggregate data (perform ops) via an API …
|
||||
|
||||
When querying data to use in a client application I want to be able to perform aggregations such as sum, group by etc so that I can get back summary data directly and efficiently (and don't have to compute myself or wait for large amounts of data)
|
||||
|
||||
#### SQL API
|
||||
|
||||
When querying the API as a Power User I want to use SQL so that I can do complex queries and operations and reuse my exisitng SQL knowledge
|
||||
|
||||
#### GeoData API
|
||||
|
||||
When querying a dataset with geo attributes such as location I want to be able use geo-oriented functionality e.g. find all items near X so that I can find the records I want by location
|
||||
|
||||
#### Free Text Query (Google Style / ElasticSearch Style)
|
||||
|
||||
When querying I want to do a google style search in data e.g. query for "brown" and find all rows with brown in them or do `brown road_name:*jfk*` and get all results with brown in them and whose field `road_name` has `jfk` in it so that I can provide a flexible query interface to my users
|
||||
|
||||
#### Custom Data API
|
||||
|
||||
As a Data Curator I want to create a custom API for one or more resources so that users can access my data in convenient ways …
|
||||
|
||||
* E.g. query by dataset or resource name rather than id ...
|
||||
|
||||
#### Search through all data (that is searchable) / Get Summary Info
|
||||
|
||||
As a Consumer I want to search across all the data in the Data Portal at once so that I can find the value I want quickly and easily … (??)
|
||||
|
||||
#### Search for variables used in datasets
|
||||
|
||||
As a Consumer (researcher/student …) I want to look for datasets with particular variables in them so that I can quickly locate the data I want for my work
|
||||
|
||||
* Search across the column names so that ??
|
||||
|
||||
#### Track Usage of my Data API
|
||||
|
||||
As a DataSet Owner I want to know how much my Data API is being used so that I can report that to stakeholders / be proud of that
|
||||
|
||||
#### Limit Usage of my Data API (and/or charge for it)
|
||||
|
||||
As a Sysadmin I want to limit usage of my Data API per user (and maybe charge for above a certain level) so that I don’t spend too much money
|
||||
|
||||
#### Restrict Access to my Data API
|
||||
|
||||
As a Publisher I want to only allow specific people to access data via the data API so that …
|
||||
|
||||
* Want this to mirror the same restrictions I have on the dataset / resources elsewhere (?)
|
||||
|
||||
### UI for Exploring Data
|
||||
|
||||
>[!warning]This probably is not a Data API epic -- rather it would come under the Data Explorer.
|
||||
|
||||
* I want an interface to “sql style” query data
|
||||
* I want a filter interface into data
|
||||
* I want to download filtered data
|
||||
* ...
|
||||
|
||||
### Write API
|
||||
|
||||
When adding data I want to write new rows via the data API so that the new data is available via the API
|
||||
|
||||
* ? do we also want a way to do bulk additions?
|
||||
|
||||
|
||||
### DataStore
|
||||
|
||||
When creating a Data API I want a structured data store (e.g. relational database) so that I can power the Data API and have it be fast, efficient and reliable.
|
||||
|
||||
|
||||
## CKAN v2
|
||||
|
||||
In CKAN 2 the bulk of this functionality is in the core extension `ckanext-datastore`:
|
||||
|
||||
* https://docs.ckan.org/en/2.8/maintaining/datastore.html
|
||||
* https://github.com/ckan/ckan/tree/master/ckanext/datastore
|
||||
|
||||
In summary: the underlying storage is provided by a Postgres database. A dataset resource is mapped to a table in Postgres. There are no relations between tables (no foreign keys). A read and write API is provided by a thin Python wrapper around Postgres. Bulk data loading is provided in separate extensions.
|
||||
|
||||
### Implementing the 4 Components
|
||||
|
||||
Here's how CKAN 2 implements the four components described above:
|
||||
|
||||
* Read API: is provided by an API wrapper around Postgres. This is written as a CKAN extension written in Python and runs in process in the CKAN instance.
|
||||
* Offers both classic Web API query and SQL queries.
|
||||
* Full text, cross field search is provided via Postgres and creating an index concatenating across fields.
|
||||
* Also includes a write API and functions to create tables
|
||||
* DataStore: a dedicated Postgres database (separate to the main CKAN database) with one table per resource.
|
||||
* Data Load: provided by either DataPusher (default) or XLoader. More details below.
|
||||
* Utilize the CKAN jobs system to load data out of band
|
||||
* Some reporting integrated into UI
|
||||
* Supports tabular data (CSV or Excel) : this converts CSV or Excel into data that can be loaded into the Postgres DB.
|
||||
* Bulk Export: you can bulk download via the extension using the dump functionality https://docs.ckan.org/en/2.8/maintaining/datastore.html#downloading-resources
|
||||
* Note however this will have problems with large resources either timing out or hanging the server
|
||||
|
||||
### Read API
|
||||
|
||||
The CKAN DataStore extension provides an ad-hoc database for storage of structured data from CKAN resources.
|
||||
|
||||
See the DataStore extension: https://github.com/ckan/ckan/tree/master/ckanext/datastore
|
||||
|
||||
[Datastore API](https://docs.ckan.org/en/2.8/maintaining/datastore.html#the-datastore-api)
|
||||
|
||||
[Making Datastore API requests](https://docs.ckan.org/en/2.8/maintaining/datastore.html#making-a-datastore-api-request)
|
||||
|
||||
[Example: Create a DataStore table](https://docs.ckan.org/en/2.8/maintaining/datastore.html#ckanext.datastore.logic.action.datastore_create)
|
||||
|
||||
```sh
|
||||
curl -X POST http://127.0.0.1:5000/api/3/action/datastore_create \
|
||||
-H "Authorization: {YOUR-API-KEY}" \
|
||||
-d '{ "resource": {"package_id": "{PACKAGE-ID}"}, "fields": [ {"id": "a"}, {"id": "b"} ] }'
|
||||
```
|
||||
|
||||
|
||||
### Data Load
|
||||
|
||||
See [Load page](/docs/dms/load#ckan-v2).
|
||||
|
||||
### DataStore
|
||||
|
||||
Implemented as a separate Postgres Database.
|
||||
|
||||
https://docs.ckan.org/en/2.8/maintaining/datastore.html#setting-up-the-datastore
|
||||
|
||||
### What Issues are there?
|
||||
|
||||
Sharp Edges
|
||||
|
||||
* connection between MetaStore (main CKAN objects DB) and DataStore is not always well maintained e.g, if I call “purge_dataset” action, it will remove stuff from MetaStore but it won’t delete a table from DataStore. This does not break UX but your DataStore DB raises in size and you might have junk tables with lots of data.
|
||||
|
||||
DataStore (Data API)
|
||||
|
||||
* One table per resource and no way to join across resources
|
||||
* Indexes are auto-created and no way to customize per resource. This can lead to issues on loading large datasets.
|
||||
* No API gateway (i.e. no way to control DDOS’ing, to do rate limiting etc)
|
||||
* SQL queries not working (with private datasets)
|
||||
|
||||
## CKAN v3
|
||||
|
||||
Following the general [next gen microservices approach][ng], the Data API is separated into distinct microservices.
|
||||
|
||||
[ng]: /docs/dms/ckan-v3/next-gen
|
||||
|
||||
### Read API
|
||||
|
||||
Approach: Refactor current DataStore API into a standalone microservice. Key point would be to break out permissioning. Either via a call out to separate permissioning service or a simple JWT approach where capability is baked in.
|
||||
|
||||
Status: In Progress (RFC) - see https://github.com/datopian/data-api
|
||||
|
||||
### Data Load
|
||||
|
||||
Implemented via AirCan. See [Load page](/docs/dms/load).
|
||||
|
||||
### Storage
|
||||
|
||||
Back onto Postgres by default just like CKAN 2. May also explore using other backends esp from Cloud Providers e.g. BigQuery or AWS RedShift etc.
|
||||
|
||||
* See Data API service https://github.com/datopian/data-api
|
||||
* BigQuery: https://github.com/datopian/ckanext-datastore-bigquery
|
||||
282
site/content/docs/dms/data-explorer.md
Normal file
282
site/content/docs/dms/data-explorer.md
Normal file
@@ -0,0 +1,282 @@
|
||||
---
|
||||
sidebar: auto
|
||||
---
|
||||
|
||||
# Data Explorer
|
||||
|
||||
The Datopian Data Explorer is a React single page application and framework for creating and displaying rich data explorers (think Tableau-lite). Use stand-alone or with CKAN. For CKAN it is a drop-in replacement for ReclineJS in CKAN Classic.
|
||||
|
||||

|
||||
> [Data Explorer for the City of Montreal](http://montreal.ckan.io/ville-de-montreal/geobase-double#resource-G%C3%83%C2%A9obase%20double)
|
||||
|
||||
## Features / Highlights
|
||||
|
||||
"Data Explorer" is an embedable React/Redux application that allows users to:
|
||||
|
||||
* Explore tabular, map, PDF, and other types of data
|
||||
* Create map views of tabular data using the [Map Builder](#map-builder)
|
||||
* Create charts and graphs of tabular data using [Chart Builder](#chart-builder)
|
||||
* Easily build SQL queries for Data Store API using graphical interface of [Datastore Query Builder](#datastore-query-builder)
|
||||
|
||||
## Components
|
||||
|
||||
The Data Explorer application acts as a coordinating layer and state management solution -- via [Redux](https://redux.js.org/) -- for several libraries, also maintained by Datopian.
|
||||
|
||||
### [Datapackage Views](https://github.com/datopian/datapackage-views-js)
|
||||
|
||||

|
||||
|
||||
Datapackage View is the rendering engine for the main window of the Data Explorer.
|
||||
|
||||
The above image displays a table shown at the `Table` tab, but note that Datapackage-views renders _all_ data visualizations: Tables, Charts, Maps, and others.
|
||||
|
||||
### [Datastore Query Builder](https://github.com/datopian/datastore-query-builder)
|
||||
|
||||
<img alt="Datastore Query Builder" src="/static/img/docs/dms/data-explorer/query-builder.png" width="250px" />
|
||||
|
||||
The Datastore Query Builder interfaces with the Datastore API to allow users to search data resources using an SQL like interface. See the docs for this module here - [Datastore Query Builder docs](/docs/dms/data-explorer/datastore-query-builder/).
|
||||
|
||||
### [Map Builder](https://github.com/datopian/map-builder)
|
||||
|
||||
<img alt="Map Builder" src="/static/img/docs/dms/data-explorer/map-builder.png" width="250px" />
|
||||
|
||||
Map Builder allows users to build maps based on geo-data contained in tabular resources.
|
||||
|
||||
Supported geo formats:
|
||||
* lon / lat (separate columns)
|
||||
|
||||
### [Chart Builder](https://github.com/datopian/chart-builder)
|
||||
|
||||
<img alt="Chart Builder" src="/static/img/docs/dms/data-explorer/chart-builder.png" width="250px" />
|
||||
|
||||
Chart Builder allows users to create charts and graphs from tabular data.
|
||||
|
||||
## Quick-start (Sandbox)
|
||||
|
||||
* Clone the data explorer
|
||||
```bash
|
||||
$ git clone git@gitlab.com:datopian/data-explorer.git
|
||||
```
|
||||
* Use yarn to install the project dependencies
|
||||
```bash
|
||||
$ cd data-explorer
|
||||
$ yarn
|
||||
```
|
||||
* To see the Data Explorer running in a sandbox environment run [Cosmos](https://github.com/react-cosmos/react-cosmos)
|
||||
```bash
|
||||
$ yarn cosmos
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
[`data-datapackage` attribute](#add-data-explorer-tags-to-the-page-markup) may influence how the element will be displayed. It can be created from a [datapackage descriptor](https://frictionlessdata.io/specs/data-package/).
|
||||
|
||||
### Fixtures
|
||||
|
||||
Until we have better documentation on Data Explorer settings, use the [Cosmos fixtures](https://gitlab.com/datopian/data-explorer/blob/master/__fixtures__/with_widgets/geojson_simple.js) as an example of how to instantiate / configure the Data Explorer.
|
||||
|
||||
### Serialized state
|
||||
|
||||
`store->serializedState` is a representation of the application state _without fetched data_
|
||||
A data-explorer can be "hydrated" using the serialized state, it will refetch the data, and will render in the same state it was exported in
|
||||
|
||||
### Share links
|
||||
|
||||
Share links can be added in `datapakage.resources[0].api` attribute.
|
||||
|
||||
There is common limit of up 2000 characters on URL strings. Our share links contain the entire application store tree, which is often larger than 2000 characters, in which the application state cannot be shared via URL. Thems the breaks.
|
||||
|
||||
## Translations
|
||||
|
||||
### Add a Translation To Data Explorer
|
||||
|
||||
To add a translation to a new language to the data explorer you need to:
|
||||
|
||||
1. clone the repository you need to update
|
||||
|
||||
```bash
|
||||
git clone git@gitlab.com:datopian/data-explorer.git
|
||||
```
|
||||
2. go to `src/i18n/locales/` folder
|
||||
3. add a new sub-folder with locale name and the new language json file (e.g. `src/i18n/locales/ru/translation.json`)
|
||||
4. add the new file to resources settings in `i18n.js`:
|
||||
`src/i18n/i18n.js`:
|
||||
```javascript
|
||||
import en from './locales/en/translation.json'
|
||||
import da from './locales/da/translation.json'
|
||||
import ru from './locales/ru/translation.json'
|
||||
...
|
||||
ru: {
|
||||
translation: {
|
||||
...require('./locales/ru/translation.json'),
|
||||
...
|
||||
}
|
||||
},
|
||||
...
|
||||
```
|
||||
5. create a merge request with the changes
|
||||
|
||||
### Add a translation To a Component
|
||||
|
||||
Some strings may come from a component, to add translation for them will require some extra steps, e.g. datapackage-views-js:
|
||||
|
||||
1. clone the repository
|
||||
```bash
|
||||
https://github.com/datopian/datapackage-views-js.git
|
||||
```
|
||||
2. go to `src/i18n/locales/` folder
|
||||
3. add a new sub-folder with locale name and the new language json file (e.g. `src/i18n/locales/ru/translation.json`)
|
||||
4. add the new file to resources settings in `i18n.js`:
|
||||
`src/i18n/i18n.js`:
|
||||
```javascript
|
||||
...
|
||||
import ru from './locales/ru/translation.json'
|
||||
...
|
||||
resources: {
|
||||
...
|
||||
ru: {translation: ru},
|
||||
},
|
||||
...
|
||||
```
|
||||
5. create a pull request for datapackage-views-js
|
||||
6. get the new datapackage-views-js version after merging (e.g. 1.3.0)
|
||||
7. clone data-explorer
|
||||
8. upgrade the data-explorer's datapackage-views-js dependency with the new version:
|
||||
a. update package.json
|
||||
b. run `yarn install`
|
||||
9. add the component's translations path to Data Explorer:
|
||||
```javascript
|
||||
import en from './locales/en/translation.json'
|
||||
import da from './locales/da/translation.json'
|
||||
import ru from './locales/ru/translation.json'
|
||||
...
|
||||
ru: {
|
||||
translation: {
|
||||
...require('./locales/ru/translation.json'),
|
||||
...require('datapackage-views-js/src/i18n/locales/ru/translation.json'),
|
||||
}
|
||||
},
|
||||
...
|
||||
```
|
||||
10. create a merge request for data-explorer
|
||||
|
||||
### Testing a Newly Added Language
|
||||
|
||||
To see your language changes in Data Explorer you can run `yarn start` and change the language cookie of the page (`defaultLocale`):
|
||||
|
||||

|
||||
|
||||
### Language detection
|
||||
|
||||
Language detection rules are determined by `detection` option in `src/i18n/i18n.js` file. Please edit with care, as other projects may already depend on them.
|
||||
|
||||
## Embedding in CKAN NG Theme
|
||||
|
||||
### Copy bundle files to theme's `public` directory
|
||||
|
||||
```bash
|
||||
$ cp data-explorer/build/static/js/*.js frontend-v2/themes/your_theme/public/js
|
||||
$ cp data-explorer/build/static/js/*.map frontend-v2/themes/your_theme/public/js
|
||||
$ cp data-explorer/build/static/css/* frontend-v2/themes/your_theme/public/css
|
||||
```
|
||||
|
||||
|
||||
#### Note on app bundles
|
||||
|
||||
The bundled resources have a hash in the filename, for example `2.a3e71132.chunk.js`
|
||||
|
||||
During development it may be preferable to remove the hash from the file name to avoid having to update the script tag during iteration, for example
|
||||
|
||||
```bash
|
||||
$ mv 2.a3e71132.chunk.js 2.chunk.js
|
||||
```
|
||||
|
||||
A couple caveats:
|
||||
* The `.map` file names should remain the same so that they are loaded properly
|
||||
* Browser cache may need to be invalidated manually to ensure that the latest script is loaded
|
||||
|
||||
|
||||
### Require Data Explorer resources in NG theme template
|
||||
|
||||
In `/themes/your-theme/views/your-template-wth-explplorer.html`
|
||||
|
||||
```html
|
||||
<!-- Everything before the content block goes here -->
|
||||
{% block content %}
|
||||
|
||||
<!-- Data Explorer CSS -->
|
||||
<link rel="stylesheet" type="text/css" href="/static/css/main.chunk.css">
|
||||
<link rel="stylesheet" type="text/css" href="/static/css/2.chunk.css">
|
||||
<!-- End Data Explorer CSS -->
|
||||
```
|
||||
|
||||
### Configure datapackage
|
||||
|
||||
```htmlmixed=
|
||||
<!-- where datapackage is -->
|
||||
<srcipt>
|
||||
const datapackage = {
|
||||
resources: [{resource}], // single resource for this view
|
||||
views: [...], // can be 3 views aka widgets
|
||||
controls: {
|
||||
showChartBuilder: true,
|
||||
showMapBuilder: true
|
||||
}
|
||||
}
|
||||
</srcipt>
|
||||
```
|
||||
|
||||
### Add data-explorer tags to the page markup
|
||||
|
||||
Each Data Explorer instance needs a corresponding `<div>` in the DOM. For example:
|
||||
|
||||
```html
|
||||
{% for resource in dataset.resources %}
|
||||
<div class="data-explorer" id="data-explorer-{{ loop.index - 1 }}" data-datapackage='{{ dataset.dataExplorers[loop.index - 1] | safe}}'></div>
|
||||
{% endfor %}
|
||||
```
|
||||
|
||||
Note that each container div needs the following attributes:
|
||||
* `class="data-explorer"` (All explorer divs should have this class)
|
||||
* `id="data-explorer-0"` (1, 2, etc...)
|
||||
* `data-datapackage=`{'{JSON CONFIG}'}` (A valid JSON configuration)
|
||||
|
||||
### Add data explorer scripts to your template
|
||||
|
||||
```html
|
||||
<script type="text/javascript" src="/static/js/runtime~main.js"></script>
|
||||
<script type="text/javascript" src="/static/js/2.chunk.js"></script>
|
||||
<script type="text/javascript" src="/static/js/main.chunk.js"></script>
|
||||
```
|
||||
|
||||
*NOTE* that the scripts should be loaded _after the container divs are in the DOM, typically by placing the `<script>` tags at the bottom of the footer_
|
||||
|
||||
See [a real-world example here](https://gitlab.com/datopian/clients/ckan-montreal/blob/master/views/showcase.html)
|
||||
|
||||
## New builds
|
||||
|
||||
In order to build files for production, run `npm run build` or `yarn build`.
|
||||
|
||||
You need to have **node version >= 12** in order to build files. Otherwise a 'heap out of memory error' gets thrown.
|
||||
|
||||
### Component changes
|
||||
|
||||
If the changes involve component updates that live in separate repositories make sure to upgrade them too before building:
|
||||
1. Prepare the component with dist version (eg run yarn build:package in the component repo, see [this](/docs/dms/data-explorer/datastore-query-builder#release) for an example)
|
||||
2. run `yarn add <package>` to get latest changes, e.g. `yarn add @datopian/datastore-query-builder` (do not use `yarn upgrade`, see here on why https://github.com/datopian/data-explorer/issues/28#issuecomment-700792966)
|
||||
3. you can verify changes in `yarn.lock` - there should be the latest component commit id
|
||||
4. `yarn build` in data-explorer
|
||||
|
||||
### Testing not yet released component changes
|
||||
|
||||
If there are some changes to be tested that are not ready to be released in a component the best option is to use
|
||||
cosmos directly in the component repository, but if that is not enough you can add the dependency from a branch
|
||||
temporarily:
|
||||
|
||||
```
|
||||
yarn add https://github.com/datopian/datastore-query-builder.git#<branch name>
|
||||
```
|
||||
|
||||
## Appendix: Design
|
||||
|
||||
See [Data Explorer Design page »](/docs/dms/data-explorer/design/)
|
||||
@@ -0,0 +1,109 @@
|
||||
---
|
||||
sidebar: auto
|
||||
---
|
||||
|
||||
# Datastore Query Builder
|
||||
|
||||
This project was bootstrapped with [Create React App](https://github.com/facebook/create-react-app).
|
||||
|
||||
The code repository is located at github - https://github.com/datopian/datastore-query-builder.
|
||||
|
||||
## Usage
|
||||
|
||||
Install it:
|
||||
|
||||
```
|
||||
yarn add @datopian/datastore-query-builder
|
||||
```
|
||||
|
||||
Basic usage in a React app:
|
||||
|
||||
```JavaScript
|
||||
import React from 'react'
|
||||
import { QueryBuilder } from 'datastore-query-builder'
|
||||
|
||||
|
||||
export const MyComponent = props => {
|
||||
// `resource` is a resource descriptor that must have 'name', 'id' and
|
||||
// 'schema' properties.
|
||||
|
||||
// `action` - this should be a Redux action that expects back the resource
|
||||
// descriptor with updated 'api' property. It is up to your app to fetch data.
|
||||
return (
|
||||
<QueryBuilder resource={resource} filterBuilderAction={action} />
|
||||
)
|
||||
}
|
||||
```
|
||||
|
||||
Note that this app doesn't fetch any data - it only builds API URI based on user
|
||||
selection.
|
||||
|
||||
It's easier to learn by examples provided in the `/__fixtures__/` directory.
|
||||
|
||||
|
||||
## Features
|
||||
|
||||
* Date Picker - if the resource descriptor has a field with `date` type it will be displayed as a date picker element:
|
||||

|
||||
|
||||
## Available Scripts
|
||||
|
||||
In the project directory, you can run:
|
||||
|
||||
### `yarn cosmos` or `npm run cosmos`
|
||||
|
||||
Runs dev server with the fixtures from `__fixtures__` directory. Learn more about `cosmos` - https://github.com/react-cosmos/react-cosmos
|
||||
|
||||
### `yarn start` or `npm start`
|
||||
|
||||
Runs the app in the development mode.<br/>
|
||||
Open [http://localhost:3000](http://localhost:3000) to view it in the browser.
|
||||
|
||||
The page will reload if you make edits.<br/>
|
||||
You will also see any lint errors in the console.
|
||||
|
||||
### `yarn test` or `npm test`
|
||||
|
||||
Launches the test runner in the interactive watch mode.<br/>
|
||||
See the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information.
|
||||
|
||||
### `yarn build:package` or `npm run build:package`
|
||||
|
||||
Run this to compile your code so it is installable via yarn/npm.
|
||||
|
||||
### `yarn build` or `npm run build`
|
||||
|
||||
Builds the app for production to the `build` folder.<br/>
|
||||
It correctly bundles React in production mode and optimizes the build for the best performance.
|
||||
|
||||
The build is minified and the filenames include the hashes.<br/>
|
||||
Your app is ready to be deployed!
|
||||
|
||||
See the section about [deployment](https://facebook.github.io/create-react-app/docs/deployment) for more information.
|
||||
|
||||
## Release
|
||||
|
||||
When releasing a new version of this module, please, make sure you've built compiled version of the files:
|
||||
|
||||
```bash
|
||||
yarn build:package
|
||||
# Since this a release, you need to change version number in package.json file.
|
||||
# E.g., this is a patch release so my `0.3.6` will become `0.3.7`.
|
||||
# Now commit the changes
|
||||
git add dist/ package.json
|
||||
git commit -m "[v0.3.7]: your commit message."
|
||||
```
|
||||
|
||||
Next, you need to tag your commit and add some descriptive message about the release:
|
||||
|
||||
```bash
|
||||
git tag -a v0.3.7 -m "Your release message."
|
||||
```
|
||||
|
||||
Now you can push your commits and tags:
|
||||
|
||||
```bash
|
||||
git push origin branch && git push origin branch --tags
|
||||
```
|
||||
|
||||
The tag will initiate a Github action that will publish the release to NPM.
|
||||
145
site/content/docs/dms/data-explorer/design.md
Normal file
145
site/content/docs/dms/data-explorer/design.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# Data Explorer Design
|
||||
|
||||
>[!note]
|
||||
Design sketches from Aug 2019. This remains a work in progress though a good part was implemented in the new [Data Explorer](/docs/dms/data-explorer).
|
||||
|
||||
## Job Stories
|
||||
|
||||
[Preview] As a Data Consumer I want to have a sense of what data there is in a dataset's resources before I download it (or download an individual resource) so that I don't waste my time and get interested
|
||||
|
||||
[Preview] As a Data Consumer I want to view (the most important contents) of a resource without downloading it and opening it so i save time (and don't have to get specialist tools)
|
||||
|
||||
[Preview - with tweaks] As a Data Consumer I want to be able to display tabular data with geo info on a map so that I can see it in an easily comprehensible way
|
||||
|
||||
[Explorer] As a Viewer I want to explore (filter, facet?) a dataset so I can find the data i'm looking for ...
|
||||
|
||||
[Explorer - map] As a viewer i want to filt4er down the data i dispaly on the map so that I can see the data i want
|
||||
|
||||
[Map / Dash Creator] As a Publisher i want to create a custom map or dashboard so that I can display my data to viewers powerfully
|
||||
|
||||
[View the data] As a User, I want to see my city related data (eg, crime, road accidents) on the map so that:
|
||||
* I can easily understand which area is safe for me.
|
||||
* I can evaluate different neighbourhoods when planning a move.
|
||||
|
||||
As a User from city council, I want to see my city related data (eg, traffic) on the map so that I can take better actions to improve the city (make it safe for citizens).
|
||||
|
||||
> is this self-service created, a custom map made by publisher, an auto-generated map (e.g. preview)
|
||||
|
||||
[Data Explorer] As a Power User I want to do SQL queries on the datastore so that I can dsiplay / download the results and get insight without having to download into my own tool and do that wrangling
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
subgraph "Filter UI"
|
||||
simpleselectui[Filter by columns explicitly]
|
||||
sqlfilterui[SQL UI]
|
||||
richselectui[Filter and Group by etc in a UI]
|
||||
end
|
||||
|
||||
subgraph Renderers
|
||||
tableview[Table Renderer]
|
||||
chartview[Chart Renderer]
|
||||
mapview[Map Renderer]
|
||||
end
|
||||
|
||||
subgraph Builders
|
||||
datasetselector[Select datasets to use, potentially with combination]
|
||||
chartbuilder[Chart Builder - UI to create a chart]
|
||||
mapbuilder[Map Builder]
|
||||
end
|
||||
|
||||
subgraph APIs
|
||||
queryapi[Abstract Query API implemented by others]
|
||||
datastoreapijs[DataStore API wrapper - returns a Data Package with cached data and query as url?]
|
||||
datajs[Data Package - Data in Memory: Dataset and Table objects]
|
||||
datajsquery[Query Wrapper Around Dataset with cached data in memory]
|
||||
end
|
||||
|
||||
classDef todo fill:#f9f,stroke:#333,stroke-width:1px
|
||||
classDef working fill:#00ff00,stroke:#333,stroke-width:1px
|
||||
|
||||
class chartbuilder todo;
|
||||
class chartview,tableview,mapview,simpleselectui working;
|
||||
```
|
||||
|
||||
Filter UI updates Redux Store using a one-way data binding as the ONLY way to modify application state or component state (except internal state of components as needed):
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
FilterUI_Update --> ReduxACTION:UpdateFilters
|
||||
ReduxACTION:UpdateFilters --> RefetchData
|
||||
ReduxACTION:UpdateFilters --> updateUIState
|
||||
|
||||
RefetchData --store.workingData--> UpdateStore
|
||||
updateUIState --store.uiState--> UpdateStore
|
||||
|
||||
UpdateStore --> RerenderApp
|
||||
```
|
||||
|
||||
|
||||
## Interfaces to define
|
||||
|
||||
```
|
||||
dataset => data package
|
||||
query[Query - source data package + cached data + filter state]
|
||||
workingdataset[Working Dataset in Memory]
|
||||
chartconfig[]
|
||||
mapconfig[]
|
||||
```
|
||||
|
||||
### Redux store / top level state
|
||||
|
||||
```javascript=
|
||||
queryuistate: {
|
||||
// url / data package rarely changes during lifetime of explorer usually
|
||||
url: datastore url / or an original data package
|
||||
filters: ...
|
||||
sqlstatement:
|
||||
}
|
||||
// list of datasets / resources we are working with ...
|
||||
datasets/resources: [
|
||||
|
||||
]
|
||||
layout: [
|
||||
// this is the switcher layout where you only see one widget at a time
|
||||
layouttype: chooser; // chooserr aka singleton, stacked, custom ...
|
||||
views: [list of views in their order]
|
||||
]
|
||||
views: [
|
||||
{
|
||||
type:
|
||||
resource:
|
||||
char
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Research
|
||||
|
||||
### Our One
|
||||
|
||||

|
||||
|
||||
### Redash
|
||||
|
||||

|
||||
|
||||
### Metabase
|
||||
|
||||
https://github.com/metabase/metabase
|
||||
|
||||

|
||||
|
||||
### CKAN Classic
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
### Rufus' Data Explorer (2014)
|
||||
|
||||

|
||||
|
||||
18
site/content/docs/dms/data-lake.md
Normal file
18
site/content/docs/dms/data-lake.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Data Lake
|
||||
|
||||
A data lake is a repository -- typically a large one -- for storing data of many types. They are more flexible (less structured) than their predecessor Data Warehouses. At their crudest they are little more than raw storage with an organizational structure plus, maybe, a catalog. At their more sophisticated they can become an entire data management infrastructure.
|
||||
|
||||
The flexibility of the data lake concept is both its advantage and a limitation: almost any data architecture that includes collecting organizational data together could be described as data lake.
|
||||
|
||||
At a practical level, the flexibility can become a limitation in that **data lakes become data swamps**: the lack of structure for data lakes often limit the usability of the lake: data cannot be found or is not of adequate quality. As ThoughtWorks note: "Many enterprises failed to generate a return on their investment because they had quality issues with the data in their lakes or had invested significant sums in creating their lakes before identifying use cases."[^1]
|
||||
|
||||
[^1]: https://www.thoughtworks.com/decoder/data-lake
|
||||
|
||||
## Schematic overview of a Data Lake Architecture
|
||||
|
||||
<img src="https://docs.google.com/drawings/d/e/2PACX-1vThZmi5ok8VNaM03Vj5RQHJRQiZJIkrxaU08vpG_T_kcElFQDCO7bZVO1FJzcpR2X8wfKZVWdWXpLUz/pub?w=1159&h=484" />
|
||||
|
||||
## References
|
||||
|
||||
* https://www.thoughtworks.com/decoder/data-lake
|
||||
* https://martinfowler.com/articles/data-monolith-to-mesh.html
|
||||
287
site/content/docs/dms/data-portals.md
Normal file
287
site/content/docs/dms/data-portals.md
Normal file
@@ -0,0 +1,287 @@
|
||||
# Data Portals
|
||||
|
||||
> *Data Portals have become essential tools in unlocking the value of data for organizations and enterprises ranging from the US government to Fortune 500 pharma companies, from non-profits to startups. They provide a convenient point of truth for discovery and use of an organization's data assets. Read on to find out more.*
|
||||
|
||||
## Introduction: Data Portals are Gateways to Data
|
||||
|
||||
A Data Portal is a gateway to data. That gateway can be big or small, open or restricted. For example, data.gov is open to everyone, whilst an enterprise "intra" data portal is restricted to that enterprise (and perhaps even to certain people within it).
|
||||
|
||||
A Data Portal's core purpose is to enable the rapid discovery and use of data. However, as a flexible, central point of truth on an organizations data assets, a Data Portal can become essential data infrastructure and be extended or integrated to provide many additional features:
|
||||
|
||||
* Data storage and APIs
|
||||
* Data visualization and exploration
|
||||
* Data validation and schemas
|
||||
* Orchestration and integration of data
|
||||
* Data Lake coordination and organization
|
||||
|
||||
The rise of Data Portals reflect the rapid growth in the volume and variety of data that organizations hold and use. With so much data available internally (and externally) it is hard for users to discover and access the data they need. And with so many potential users and use-cases it is hard to anticipate what data will be needed, when.
|
||||
|
||||
Concretely: how does Jane in the new data science team know that Geoff in accounting has the spreadsheet she needs for her analysis for the COO? Moreover, it is not enough just to have a dataset's location: if users are easily to discover and access data it has to be suitably organized and presented.
|
||||
|
||||
Data portals answer this need: by making it easy to find and access data, a data portal helps solve these problems. As a result, data portals have become essential tools for organizations to bring order to the "data swamp" and unlock the value of their data assets.[^1]
|
||||
|
||||
[^1]: The nature of the problem that Data Portals solve (i.e. bringing order to diverse, distributed data assets) explains why data portals first arose in Government and as *open* data portals. Government had lots of useful data, much of it shareable, but poorly organized and strewn all over the place. In addition, much of the value of that data lay in unexpected or unforeseen uses. Thus, Data Portals in their modern form started in Government in the mid-late 2000s. They then spread into large companies and then with the spread of data into all kinds of organizations big and small.
|
||||
|
||||
## Why Data Portals?
|
||||
|
||||
### Data Variety and Volume have Grown Enormously
|
||||
|
||||
The volume and variety of data available has grown enormously. Today, even small organizations have dozens of data assets ranging from spreadsheets in their cloud drive to web analytics. Meanwhile, large organizations can have an enormous -- and bewildering -- amount and variety of data ranging from Hadoop clusters and data warehouses to CRM systems plus, of course, plenty of internal spreadsheets, databases etc.
|
||||
|
||||
In addition to this diversity of *supply* there has been a huge growth in the potential and diversity of *demand* in the form of users and use cases. Self-service business intelligence, machine learning and even tools like google spreadsheets have democratized and expanded the range of users. Meanwhile, data is no longer limited to a single purpose: much of the *new* value for data for enterprises comes from unexpected or unplanned uses and from combining data across divisions and systems.
|
||||
|
||||
### This Creates a Challenge: Getting Lost in Data
|
||||
|
||||
As organizations seek to reap the benefits of this data cornucopia they face a problem: with so much data around its easy to get lost -- or even just not know that data even exists. And, as supply and demand have expanded and diversified it has got both harder and more important to match them up.
|
||||
|
||||
The addition of data integration and data engineering can actually makes this problem even worse -- do we need to create this new dataset X from Y and Z or do we already have that somewhere? And how can people find X once we have created it? Is X a finished a dataset that people can rely on or is it a one-off. Even if a one-off do we want to record that we created this kind of dataset so we can create it again in the future if we need it?[^lakes]
|
||||
|
||||
### Data Portals are a Platform that Connect Supply and Demand for Data
|
||||
|
||||
By making it easy to find and access data, a data portal helps address all these problems. As a platform it connects creators and users of data in a single place. As a single source of base metadata it provides essential infrastructure for data integration. By acting as a central repository of data it enables new forms of publication and sharing. Data Portals therefore play a central role in unlocking the value of data for organizations.
|
||||
|
||||
[^lakes]: Ditto for data lakes: the growth of data lakes have made data portals (and metadata management) even more important because without them your data lake quickly turns into a data swamp where data is hard to locate and even if found lacks esssential metadata and structure that would make make it usable.
|
||||
|
||||
### Data Portals as the First Step in an Agile Data Strategy
|
||||
|
||||
Data portals also play an initial, concrete step in data engineering / data strategy. Suppose you are a newly arrived CDO.
|
||||
|
||||
The first questions you will be asking are things like: what data do we have, where is it, what state is it in? (And secondarily, what data use cases do we have? Who has them? Do they match against the data we have?).
|
||||
|
||||
This immediately leads to a need do a data inventory. And for a data inventory you need a tool to hold the results (and structure this) = a data portal.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
cdo[Chief Data Officer]
|
||||
what[What data do we have?]
|
||||
inventory["We need a data inventory"]
|
||||
portal[We need a data portal / catalog]
|
||||
|
||||
cdo --> what
|
||||
what --> inventory
|
||||
inventory --> portal
|
||||
```
|
||||
|
||||
Even in more sophisticated situations, a data portal is a great place to start. Suppose you are a newly arrived CDO at an organization with an existing data lake and a rich set of data integration and data analytics workflows.
|
||||
|
||||
There is a good chance your data lake is rapidly becoming a data swamp and there is nothing to track dependencies and connections in those data and analytics pipelines. Again a simple data portal is a great place to start in bringing some order to this: lightweight, vendor-independent and (if you choose CKAN) open source infrastructure that gives you a simple solution for collecting and tracking metadata across datasets and data workflows.
|
||||
|
||||
### Summary: Data Portals make Data Discoverable and Accessible and provide a Foundation for Integration
|
||||
|
||||
In summary, Data Portals deliver value in three distinct, interlocking and incremental ways by:
|
||||
|
||||
* Making data discoverable: ranging from Excel files to Hadoop clusters. The portal does this by providing the infrastructure *and* process for reliable, cross-system metadata management and access (via humans *and* machines)
|
||||
* Make data accessible: whether its an Excel file or a database cluster, the portal's combination of common metadata, data showcases and data APIs make data easily and quickly accessible to technical and non-technical users. Data can now be previewed and even explored in one location prior to use.
|
||||
* Making data reliable and integrable: as central store of metadata and data access points, the data portal can be naturally used as a starting point for enriching data with data dictionaries (what does column `custid` mean?), data mappings (this column in this data file is a customer ID and the customer master data is in this other dataset there), and data validation (does this column of dates contain valid dates, are some of them out of range?)
|
||||
|
||||
In addition, in terms of proper data infrastructure and data engineering, a Data Portal provides both initial starting point, simple scaffolding and a solid foundation. It is an organizing point and rosetta stone for data discovery and metadata.
|
||||
|
||||
* TODO: this really links into the story of how to start doing data engineering / building a data lake / doing agile data etc etc
|
||||
* For example, suppose you want to do some non-trivial business intelligence. You'll need a list of the datasets you'll need -- maybe sales, plus analytics, plus some public economic data. Where are you going to track those datasets? Where are you going to track the resulting datasets you produce?
|
||||
* For example, suppose your data engineering team are building out data pipelines. These pipelines pull in a variety of datasets, integrate and transform them and then save the results. How are they going to track what datasets they are using and what they have produced? They are going to need a catalog. Rather than inventing their own (the classic "json file in git or spreadsheet in google docs etc" you want them to use a proper catalog (or integrate with your existing one).
|
||||
* Using an open source service-oriented data portal framework like CKAN you can rapidly integrate and scale out your data orchestration. It provides a "small pieces, loosely joined" approach to developing your data infrastructure starting from the basics: what datasets do you have, what datasets do you want to create?
|
||||
|
||||
|
||||
## What does a Data Portal do?
|
||||
|
||||
### A Data Portal provides a Catalog
|
||||
|
||||
In its most basic essence, a Data Portal is a catalog of datasets. Even here there are degrees: at its simplest a catalog is just a list of dataset names and links; whilst more sophisticated catalogs will have elaborate metadata on each dataset.
|
||||
|
||||
### And Much More ...
|
||||
|
||||
Along with the essential basic catalog features, modern portals now incorporate an extensive range of functionality for organizing, structuring and presenting data including:
|
||||
|
||||
* **Publication workflow and metadata management**: rich editing interfaces and workflows (for example, approval steps), bulk editing of metadata etc
|
||||
* **Showcasing and presentation of datasets** extending to interactive exploration. For example, if a dataset contains an Excel file, in addition to linking to that file the portal will also display the contents in a table, allow users to create visualizations, and even to search and explore the data
|
||||
* **Data storage and APIs**: as well as cataloging metadata and linking to data stored elsewhere, data portals can also store data. Building off this, data portals can provide "data APIs" to the underlying data to complement direct access and download. These APIs make it much quicker and easier for users to build their own rich applications and analytics workflows.
|
||||
* **Permissions**: fine-grained permissions to control access to data and related materials.
|
||||
* **Data ingest and transformation:** ETL style functionality e.g. for bulk harvesting metadata, or preparing or excerpting data for presentation (for example, loading data to power data APIs)
|
||||
|
||||
Moreover, as a flexible, central point of truth on an organizations data assets a Data Portal can become the foundations for broader data infrastructure and data management, for example:
|
||||
|
||||
* Orchestration of data integration: as a central repository of metadata, data portals are perfectly placed to integrate with wider data engineering and ETL workflows
|
||||
* Data quality and provenance tracking
|
||||
* Data Lake coordination and organization
|
||||
|
||||
## What are the main features of a Data Portal?
|
||||
|
||||
Focus on "functional" features vs technical features.
|
||||
|
||||
E.g. each technical feature may require one or more of these technical features:
|
||||
|
||||
* Storage
|
||||
* API
|
||||
* Frontend
|
||||
* Admin UI (WebUI, possibly CLI, Mobile etc)
|
||||
|
||||
### High Level Overview
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
perdataset --> storage[Store Data]
|
||||
perdataset --> metadata[Store Metadata]
|
||||
perdataset --> versioning
|
||||
perdataset --> events[Notifications of Life Cycle events]
|
||||
perdataset --> basic[Basic Access Control]
|
||||
|
||||
permissions --> auth[Identify]
|
||||
permissions --> authz[Authorization]
|
||||
permissions --> permintr[Permissions Integration]
|
||||
|
||||
hub --> showcase["(Pre)Viewing the Dataset"]
|
||||
hub --> discovery[Discovery]
|
||||
hub --> orgs[Users, Teams and Ownership]
|
||||
hub --> tags[Tags, Themes]
|
||||
hub --> audit[Audit and Notifications]
|
||||
|
||||
integration[Data Integration] --> pipelines
|
||||
integration --> harvesting
|
||||
```
|
||||
|
||||
### Coggle Detailed Overview
|
||||
|
||||
https://coggle.it/diagram/Xiw2ZmYss-ddJVuK/t/data-portal-feature-breakdown
|
||||
|
||||
<iframe width='853' height='480' src='https://embed.coggle.it/diagram/Xiw2ZmYss-ddJVuK/b24d6f959c3718688fed2a5883f47d33f9bcff1478a0f3faf9e36961ac0b862f' frameborder='0' allowfullscreen></iframe>
|
||||
|
||||
### Detailed Feature Breakdown
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
dms[Basics]
|
||||
dmsplus["Plus"]
|
||||
cms[CMS]
|
||||
|
||||
theming[Theming]
|
||||
permissions[Permissions]
|
||||
datastore[Data API]
|
||||
monitoring[Monitoring]
|
||||
usage[Usage Analytics]
|
||||
monitoring[Monitoring]
|
||||
harvesting[Harvesting]
|
||||
etl[ETL]
|
||||
blog[Blog]
|
||||
contact[Contact Page]
|
||||
help[Support]
|
||||
newsletter[Newsletter]
|
||||
metadata[Metadata]
|
||||
showcase[Showcase]
|
||||
activity[Activity Streams]
|
||||
search[Data Search]
|
||||
catalogsearch[Catalog Search]
|
||||
multi[Multi-language metadata]
|
||||
resource[Resource previews]
|
||||
xloader[xLoader]
|
||||
datapusher[Data Pusher]
|
||||
revision[Revisioning]
|
||||
explorer[Data Explorer]
|
||||
datavalidation[Data Validation]
|
||||
filestore[FileStore]
|
||||
siteadmin[Site Admin]
|
||||
|
||||
dms --> metadata
|
||||
dms --> activity
|
||||
dms --> catalogsearch
|
||||
dms --> showcase
|
||||
dms --> resource
|
||||
dms --> multi
|
||||
dms --> filestore
|
||||
dms --> theming
|
||||
dms --> i18n
|
||||
dms --> siteadmin
|
||||
|
||||
dmsplus --> permissions
|
||||
dmsplus --> revision
|
||||
dmsplus --> datastore
|
||||
dmsplus --> monitoring
|
||||
dmsplus --> usage
|
||||
dmsplus --> search
|
||||
dmsplus --> explorer
|
||||
|
||||
cms --> blog
|
||||
cms --> contact
|
||||
cms --> help
|
||||
cms --> newsletter
|
||||
|
||||
etl --> datapusher
|
||||
etl --> xloader
|
||||
etl --> harvesting
|
||||
etl --> datavalidation
|
||||
```
|
||||
|
||||
* Theming - customizing the look and feel of the portal
|
||||
* i18n
|
||||
* CMS - e.g., news/ideas/about/docs. Learn about CMS options - [CMS](/docs/dms/data-portals/cms).
|
||||
* Blog
|
||||
* Contact page?
|
||||
* Help / Support / Chat
|
||||
* Newsletter
|
||||
* DMS Basic - Catalog: manage/catalog multiple formats of data
|
||||
* Activity Streams
|
||||
* Data Showcase (aka Dataset view page) -
|
||||
* Resource previews
|
||||
* Metadata creation and managemet
|
||||
* Multi-language metadata
|
||||
* Data import and storage
|
||||
* Storing data
|
||||
* Data Catalog searching
|
||||
* Data searching
|
||||
* Multiple Formats of data
|
||||
* Tagging and Grouping of Datasets
|
||||
* Organization as publishers and teams
|
||||
* DMS Plus
|
||||
* Permissions: identity, authentication, accounts and authorization (including "teams/groups/orgs")
|
||||
* Revisioning of data and metadata
|
||||
* DataStore and Data API: ....
|
||||
* Monitoring: who is doing what, audit log etc
|
||||
* Usage Analytics: e.g. number of views, amount of downloads, recent activity
|
||||
* ETL: automated metadata and data import and processing (e.g. to data store), data transformation ...
|
||||
* Harvesting: metadata and data harvesting
|
||||
* DataPusher
|
||||
* xLoader
|
||||
* (Data) Explorer: Visualizations and Dashboards
|
||||
|
||||
|
||||
Data Validation
|
||||
|
||||
DevOps
|
||||
|
||||
* CKAN Cloud: multi-instance deployment and management
|
||||
* Backups / Disaster recovery
|
||||
|
||||
Not sure they merit an item ...
|
||||
|
||||
* Cross Platform
|
||||
* Data Sharing: A place to store data, with a permanent link to share to the public.
|
||||
* Discussions
|
||||
* RSS
|
||||
* Multi-file download
|
||||
|
||||
## CKAN the Data Portal Software
|
||||
|
||||
CKAN is the leading data portal software.
|
||||
|
||||
It is both usable out of the box and can also be utilized as a powerful framework for creating tailored solutions.
|
||||
|
||||
CKAN's combintation of open source codebase and enterprise support make it uniquely attractive for organizations looking to build customized, enterprise-grade solutions.
|
||||
|
||||
|
||||
## Appendix
|
||||
|
||||
TODO: From Data Portal to DataHub (or Data Management System).
|
||||
|
||||
### Is a Data Catalog the same as a Data Portal? (Yes)
|
||||
|
||||
Is a data catalog the same as a data portal? Yes. Data Portals are the evolution of data catalogs.
|
||||
|
||||
Data Portals were originally called a variety of names including "Data Catalog". As catalogs grew in features they have evolved into a full portal.
|
||||
|
||||
### Open Data Portals and Internal Data Portals.
|
||||
|
||||
Many initial data portals were "open" or public: that is anyone could access them -- and the data they listed. This reflected the fact that these data portals were set up by governments seeking to maximize the value of their data by sharing it as widely as possible.
|
||||
|
||||
However, there is no reason a data portal need be "open". In fact, data portals internal to an enterprise are usually restricted to the organization or even specific teams within the enterprise.
|
||||
158
site/content/docs/dms/dataframe.md
Normal file
158
site/content/docs/dms/dataframe.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# DataFrame
|
||||
|
||||
Designing a dataframe.js - and understanding data libs and data models in general.
|
||||
|
||||
TODO: integrate https://github.com/datopian/dataframe.js - my initial review from ~ 2015 onwards.
|
||||
|
||||
## Introduction
|
||||
|
||||
Conceptually a data library consists of:
|
||||
|
||||
* A data model i.e. a set of classes for holding / describing data e.g. Series (Vector/1d array), DataFrame (Table/2d array) (and possibly higher dim arrays)
|
||||
* Tooling
|
||||
* Operations e.g. group by, query, pivot etc etc
|
||||
* Import / Export: load from csv, sql, stata etc etc
|
||||
|
||||
## Our need
|
||||
|
||||
We need to build tools for wrangling and presenting data ... that are ...
|
||||
|
||||
* focused on smallish data
|
||||
* run in the browser and/or are lightweight/easy to install
|
||||
|
||||
Why? Because ...
|
||||
|
||||
* We want to build easy to use / install applications for non-developers (so they aren't going to use pandas or a jupyter notebook PLUS they want a UI PLUS probably not big data (or if it is we can work with a sample))
|
||||
* We're often using these tools in web applications (or in e.g. desktop app like electron)
|
||||
|
||||
Discussion
|
||||
|
||||
* Could we not have browser act as thin client and push code to some backend ...? Yes we could but that means a whole other service ...
|
||||
|
||||
What we want: Something like openrefine but running in the browser ...
|
||||
|
||||
### Why not just use R / Pandas
|
||||
|
||||
Context: R, Pandas are already awesome. In fact, super-awesome. And they have huge existing communities and ecosystems.
|
||||
|
||||
Furthermore, not only do they do data analysis (so all the data science folks are using) but they are also pretty good for data wrangling (esp pandas)
|
||||
|
||||
So, we'd heavily recommend these (esp pandas) if you are developer (and doing work on your local machine).
|
||||
|
||||
However, ...
|
||||
|
||||
* if you're not a developer they can be daunting (even wrapped up in a juypyter notebook).
|
||||
* if you are a developer and actually doing data engineering there are some issues
|
||||
* pandas is a "kitchen-sink" of a library and depends on numpy. This makes it a heavy-weight dependency and harder to put into data pipelines and flows
|
||||
* monolithic nature makes them hard to componentize ...
|
||||
|
||||
|
||||
## Pandas
|
||||
|
||||
### Series
|
||||
|
||||
https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#series
|
||||
|
||||
> Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:
|
||||
>
|
||||
> ``>>> s = pd.Series(data, index=index)``
|
||||
|
||||
* Series is a 1-d array with the convenience of labelling each cell in the array with the index (which defaults to 0...n if not specified).
|
||||
* This allows you to treat Series as an array *and* a dictionary
|
||||
* You can give it a name "Series can also have a name attribute:
|
||||
|
||||
`s = pd.Series(np.random.randn(5), name='something')`"
|
||||
|
||||
### DataFrame
|
||||
|
||||
https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html
|
||||
|
||||
> DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object. Like Series, DataFrame accepts many different kinds of input:
|
||||
|
||||
### Higher dimensional arrays
|
||||
|
||||
Not supported. See xarray.
|
||||
|
||||
## XArray
|
||||
|
||||
Comment: mature and well thought out. Exists to generalize pandas to higher levels.
|
||||
|
||||
http://xarray.pydata.org/en/stable/ => multidimensional arrays in pandas
|
||||
|
||||
> xarray has two core data structures, which build upon and extend the core strengths of NumPy and pandas. Both data structures are fundamentally N-dimensional:
|
||||
>
|
||||
> DataArray is our implementation of a labeled, N-dimensional array. It is an N-D generalization of a pandas.Series. The name DataArray itself is borrowed from Fernando Perez’s datarray project, which prototyped a similar data structure.
|
||||
>
|
||||
> Dataset is a multi-dimensional, in-memory array database. It is a dict-like container of DataArray objects aligned along any number of shared dimensions, and serves a similar purpose in xarray to the pandas.DataFrame.
|
||||
|
||||
(Personally not sure about the analogy: Dataset is like a collection of series *or* DataFrames)
|
||||
|
||||
## NTS
|
||||
|
||||
|
||||
* Pandas 2: https://dev.pandas.io/pandas2/ - https://github.com/pandas-dev/pandas2 (from 2017 for pandas2)
|
||||
* Pandas 2 never happened https://github.com/pandas-dev/pandas2 (stalled in 2017 ...). May happen in 2021 according to this milestone for it https://github.com/pandas-dev/pandas/milestone/42
|
||||
|
||||
## Inbox
|
||||
|
||||
* Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second. 3.2 ⭐
|
||||
* https://ray.io/ - distributed computing in python (??)
|
||||
* Seems to be an alternative / competitor to ray but more general (dask is very oriented to scaling pandas style stuff)
|
||||
* https://modin.readthedocs.io/en/latest/ - a way to convert pandas to run in parallel "Scale your pandas workflow by changing a single line of code"
|
||||
* https://github.com/reubano/meza - meza is a Python library for reading and processing tabular data. It has a functional programming style API, excels at reading/writing large files, and can process 10+ file types.
|
||||
* Quite a few similarities to frictionless data type stuff
|
||||
* Mainly active 2015-2017 afaict and last commit in 2018
|
||||
* https://github.com/atlanhq/camelot PDF Table Extraction for Humans
|
||||
* http://blaze.pydata.org/ - seems inactive since 2016 (according to blog) and github repos look quiet since ~ 2016
|
||||
* Datashape - https://datashape.readthedocs.io/en/latest/overview.html - Datashape is a data layout language for array programming. It is designed to describe in-situ structured data without requiring transformation into a canonical form.
|
||||
* Dask: https://dask.org - "Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love". Was part of Blaze and now split out as a separate project. this is still *very* active (in fact main maintainer formed a consulting company for this in 2020)
|
||||
* https://github.com/dask/dask "Parallel computing with task scheduling"
|
||||
* odo - https://odo.readthedocs.io/en/latest/ - https://github.com/blaze/odo - Odo: Shapeshifting for your data
|
||||
|
||||
> odo takes two arguments, a source and a target for a data transfer.
|
||||
>
|
||||
> ```
|
||||
> >>> from odo import odo
|
||||
> >>> odo(source, target) # load source into target
|
||||
> ```
|
||||
>
|
||||
> It efficiently migrates data from the source to the target through a network of conversions.
|
||||
|
||||
### Blaze
|
||||
|
||||
The Blaze ecosystem is a set of libraries that help users store, describe, query and process data. It is composed of the following core projects:
|
||||
|
||||
* Blaze: An interface to query data on different storage systems
|
||||
* Dask: Parallel computing through task scheduling and blocked algorithms
|
||||
* Datashape: A data description language
|
||||
* DyND: A C++ library for dynamic, multidimensional arrays
|
||||
* Odo: Data migration between different storage systems
|
||||
|
||||
## Appendix: JS "DataFrame" Libraries
|
||||
|
||||
A list of existing libraries.
|
||||
|
||||
*Note: when we started research on this in 2015 there were none that we could find so a good sign that they are developing.*
|
||||
* https://github.com/opensource9ja/danfojs 274⭐️ - ACTIVE Last update Aug 2020
|
||||
* https://github.com/StratoDem/pandas-js 280⭐️ - INACTIVE last update sep 2017
|
||||
* https://github.com/fredrick/gauss 428⭐️ - INACTIVE last update 2015 - JavaScript statistics, analytics, and data library - Node.js and web browser ready
|
||||
* https://github.com/Gmousse/dataframe-js 283⭐️ - INACTIVE? started in 2016 and largely inactive since 2018 (though minor update in early 2019)
|
||||
* dataframe-js provides another way to work with data in javascript (browser or server side) by using DataFrame, a data structure already used in some languages (Spark, Python, R, ...).
|
||||
* Comment: support browser and node etc. Pretty well structured. A long way from Pandas still.
|
||||
* https://github.com/osdat/jsdataframe 26⭐️ - INACTIVE started in 2016 and not much activity since 2017. Seems fairly R oriented (e.g. melt)
|
||||
* Jsdataframe is a JavaScript data wrangling library inspired by data frame functionality in R and Python Pandas. Vector and data frame constructs are implemented to provide vectorized operations, summarization methods, subset selection/modification, sorting, grouped split-apply-combine operations, database-style joins, reshaping/pivoting, JSON serialization, and more. It is hoped that users of R and Python Pandas will find the concepts in jsdataframe quite familiar.
|
||||
* https://github.com/maxto/ubique 91⭐️ - ABANDONED last update in 2015 and stated as discontinued. A mathematical and quantitative library for Javascript and Node.js
|
||||
* https://github.com/misoproject/dataset 1.2k⭐️ - now abandonware as no development since 2014, site is down (and maintainers seem unresponsive) (was a nice project!)
|
||||
|
||||
Other ones (not very active or without much info):
|
||||
|
||||
* https://github.com/walnutgeek/wdf - 1⭐️ "web data frame" last commit in 2014 http://walnutgeek.github.io/wdf/DataFrame.html
|
||||
* https://github.com/cjroth/dataframes
|
||||
* https://github.com/jpoles1/dataframe.js
|
||||
* https://github.com/danrobinson/dataframes
|
||||
|
||||
### References
|
||||
|
||||
* https://stackoverflow.com/questions/30610675/python-pandas-equivalent-in-javascript/43825646 (has a community wiki section)
|
||||
* https://www.man.com/maninstitute/short-review-of-dataframes-in-javascript (2018) - pretty good review in june 2018. As it points there is no clear solution.
|
||||
|
||||
11
site/content/docs/dms/datahub.md
Normal file
11
site/content/docs/dms/datahub.md
Normal file
@@ -0,0 +1,11 @@
|
||||
# DataHub Documentation
|
||||
|
||||
Welcome to the DataHub documentation.
|
||||
|
||||
DataHub is a platform for *people* to **store, share and publish** their data, **collect, inspect and process** it with **powerful tools**, and **discover and use** data shared by others.
|
||||
|
||||
Our focus is on data wranglers and data scientists: those who use automate their work with data using code and command line tools rather than editing it by hand (as, for example, many analysts do in Excel). Think people who use Python vs people who use Excel for data work.
|
||||
|
||||
Our goal is to provide simplicity *and* power.
|
||||
|
||||
[Developer Docs »](/docs/dms/datahub/developers) {'<3'} Python, JavaScript and data pipelines? Start here!
|
||||
99
site/content/docs/dms/datahub/developers.md
Normal file
99
site/content/docs/dms/datahub/developers.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# Developers
|
||||
|
||||
This section of the DataHub documentation is for developers. Here you can learn about the design of the platform and how to get DataHub running locally or on your own servers, and the process for contributing enhancements and bug fixes to the code.
|
||||
|
||||
[](https://gitter.im/frictionlessdata/chat)
|
||||
|
||||
## Internal docs
|
||||
|
||||
* [API](/docs/dms/datahub/developers/api)
|
||||
* [Deploy](/docs/dms/datahub/developers/deploy)
|
||||
* [Platform](/docs/dms/datahub/developers/platform)
|
||||
* [Publish](/docs/dms/datahub/developers/publish)
|
||||
* [User Stories](/docs/dms/datahub/developers/user-stories)
|
||||
* [Views Research](/docs/dms/datahub/developers/views-research)
|
||||
* [Views](/docs/dms/datahub/developers/views)
|
||||
|
||||
## Repositories
|
||||
|
||||
We use following GitHub repositories for DataHub platform:
|
||||
|
||||
* [DEPLOY][deploy] - Automated deployment
|
||||
* [FRONTEND][frontend] - Frontend application in node.js
|
||||
* [ASSEMBLER][assembler] - Data assembly line
|
||||
* [AUTH][auth] - A generic OAuth2 authentication service and user permission manager.
|
||||
* [SPECSTORE][specstore] - API server for managing a Source Spec Registry
|
||||
* [BITSTORE][bitstore] - A microservice for storing blobs i.e. files.
|
||||
* [RESOLVER][resolver] - A microservice for resolving datapackage URLs into more human readable ones
|
||||
|
||||
|
||||
* [DOCS][docs] - Documentations
|
||||
|
||||
[deploy]: https://github.com/datahq/deploy
|
||||
[frontend]: https://github.com/datahq/frontend
|
||||
[assembler]: https://github.com/datahq/assembler
|
||||
[auth]: https://github.com/datahq/auth
|
||||
[specstore]: https://github.com/datahq/specstore
|
||||
[bitstore]: https://github.com/datahq/bitstore
|
||||
[resolver]: https://github.com/datahq/resolver
|
||||
[docs]: https://github.com/datahq/docs
|
||||
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
subgraph Repos
|
||||
frontend[Frontend]
|
||||
assembler[Assembler]
|
||||
auth[Auth]
|
||||
specstore[Specstore]
|
||||
bitstore[Bitstore]
|
||||
resolver[Resolver]
|
||||
docs[Docs]
|
||||
end
|
||||
|
||||
subgraph Sites
|
||||
dhio[datahub.io]
|
||||
dhdocs[docs.datahub.io]
|
||||
docs --> dhdocs
|
||||
end
|
||||
|
||||
deploy((DEPLOY))
|
||||
deploy --> dhio
|
||||
frontend --> deploy
|
||||
assembler --> deploy
|
||||
auth --> deploy
|
||||
specstore --> deploy
|
||||
bitstore --> deploy
|
||||
resolver --> deploy
|
||||
|
||||
```
|
||||
|
||||
## Install
|
||||
|
||||
We use several different services to run our platform, please follow the installation instructions here:
|
||||
|
||||
* [Install Assembler](https://github.com/datahq/assembler#assembler)
|
||||
|
||||
* [Install Auth](https://github.com/datahq/auth#datahq-auth-service)
|
||||
|
||||
* [Install Specstore](https://github.com/datahq/specstore#datahq-spec-store)
|
||||
|
||||
* [Install Bitstore](https://github.com/datahq/bitstore#quick-start)
|
||||
|
||||
* [Install DataHub-CLI](https://github.com/datahq/datahub-cli#usage)
|
||||
|
||||
* [Install Resolver](https://github.com/datahq/resolver#quick-start)
|
||||
|
||||
## Deploy
|
||||
|
||||
For deployment of the application in a production environment, please see [the deploy page][deploydocs].
|
||||
|
||||
[deploydocs]: /docs/dms/deploy
|
||||
|
||||
|
||||
## DataHub CLI
|
||||
|
||||
The DataHub CLI is a Node JS lib and command line interface to interact with an DataHub instance.
|
||||
|
||||
[CLI code](https://github.com/datahq/datahub-cli)
|
||||
34
site/content/docs/dms/datahub/developers/api.md
Normal file
34
site/content/docs/dms/datahub/developers/api.md
Normal file
@@ -0,0 +1,34 @@
|
||||
# DataHub API
|
||||
|
||||
The DataHub API provides a range of endpoints to interact with the platform. All endpoints live under the URL `https://api.datahub.io` where our API is divided into the following sections: **auth, rawstore, sources, metastore, resolver**.
|
||||
|
||||
## Auth
|
||||
|
||||
A generic OAuth2 authentication service and user permission manager.
|
||||
|
||||
https://github.com/datahq/auth#api
|
||||
|
||||
## Rawstore
|
||||
|
||||
DataHub microservice for storing blobs i.e. files. It is a lightweight auth wrapper for n S3-compatible object store that integrates with the rest of the DataHub stack and especially the auth service.
|
||||
|
||||
https://github.com/datahq/bitstore#api
|
||||
|
||||
## Sources
|
||||
|
||||
An API server for managing a Source Spec Registry.
|
||||
|
||||
https://github.com/datahq/specstore#api
|
||||
|
||||
## Metastore
|
||||
|
||||
A search services for DataHub.
|
||||
|
||||
https://github.com/datahq/metastore#api
|
||||
|
||||
## Resolver
|
||||
|
||||
DataHub microservice for resolving datapackage URLs into more human readable ones.
|
||||
|
||||
https://github.com/datahq/resolver#api
|
||||
|
||||
91
site/content/docs/dms/datahub/developers/deploy.md
Normal file
91
site/content/docs/dms/datahub/developers/deploy.md
Normal file
@@ -0,0 +1,91 @@
|
||||
## DevOps - Production Deployment
|
||||
|
||||
We use various cloud services for the platform, for example AWS S3 for storing data and metadata, and the application runs on Docker Cloud.
|
||||
|
||||
We have fully automated the deployment of the platform including the setup of all necessary services so that it is one command to deploy. Code and instructions here:
|
||||
|
||||
https://github.com/datahq/deploy
|
||||
|
||||
Below we provide a conceptual outline.
|
||||
|
||||
### Outline - Conceptually
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
user[fa:fa-user User] --> frontend[Frontend]
|
||||
frontend --> apiproxy[API Proxy]
|
||||
frontend --> bits[BitStore - S3]
|
||||
```
|
||||
|
||||
### New Structure
|
||||
|
||||
This diagram shows the current deployment architecture.
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
cloudflare --> haproxy
|
||||
|
||||
haproxy --> frontend
|
||||
|
||||
subgraph auth
|
||||
postgres
|
||||
authapp
|
||||
end
|
||||
|
||||
subgraph rawstore
|
||||
rawobjstore
|
||||
rawapp
|
||||
end
|
||||
|
||||
subgraph pkgstore
|
||||
pkgobjstore
|
||||
pkgapp
|
||||
end
|
||||
|
||||
subgraph metastore
|
||||
elasticsearch
|
||||
metastore
|
||||
end
|
||||
|
||||
haproxy --/auth--> authapp
|
||||
haproxy --/rawstore--> rawapp
|
||||
|
||||
haproxy --> pkgapp
|
||||
haproxy --/metastore--> metastore
|
||||
```
|
||||
|
||||
### Old Structures
|
||||
|
||||
#### Heroku
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
user[fa:fa-user User]
|
||||
bits[BitStore]
|
||||
cloudflare[Cloudflare]
|
||||
|
||||
user --> cloudflare
|
||||
cloudflare --> heroku
|
||||
cloudflare --> bits
|
||||
heroku[Heroku - Flask] --> rds[RDS Database]
|
||||
heroku --> bits
|
||||
```
|
||||
|
||||
#### AWS Lambda - Flask via Zappa
|
||||
|
||||
We are no longer using AWS and Heroku in this way. However, we have kept this for historical purposes and in case we return to any of them.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
user[fa:fa-user User] --> cloudfront[Cloudfront]
|
||||
cloudfront --> apigateway[API Gateway]
|
||||
apigateway --> lambda[AWS Lambda - Flask via Zappa]
|
||||
cloudfront --> s3assets[S3 Assets]
|
||||
lambda --> rds[RDS Database]
|
||||
lambda --> bits[BitStore]
|
||||
cloudfront --> bits
|
||||
```
|
||||
209
site/content/docs/dms/datahub/developers/platform.md
Normal file
209
site/content/docs/dms/datahub/developers/platform.md
Normal file
@@ -0,0 +1,209 @@
|
||||
# Platform
|
||||
|
||||
The DataHub platform follows a service oriented architecture. It is built from a set of loosely coupled components, each performing distinct functions related to the platform as a whole.
|
||||
|
||||
## Architecture
|
||||
|
||||
<p style={{textAlign: "center"}}>Fig 1: Data Flow through the system</p>
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
cli((CLI fa:fa-user))
|
||||
auth[Auth Service]
|
||||
cli --login--> auth
|
||||
|
||||
cli --store--> raw[Raw Store API<br>+ Storage]
|
||||
|
||||
cli --package-info--> pipeline-store
|
||||
raw --data resource--> pipeline-runner
|
||||
|
||||
pipeline-store -.generate.-> pipeline-runner
|
||||
|
||||
pipeline-runner --> package[Package Storage]
|
||||
package --api--> frontend[Frontend]
|
||||
frontend --> user[User fa:fa-user]
|
||||
|
||||
package -.publish.->metastore[MetaStore]
|
||||
pipeline-store -.publish.-> metastore[MetaStore]
|
||||
metastore[MetaStore] --api--> frontend
|
||||
```
|
||||
|
||||
<p style={{"textAlign": "center"}}>Fig 2: Components Perspective - from the Frontend</p>
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
subgraph Web Frontend
|
||||
frontend[Frontend Webapp]
|
||||
browse[Browse & Search]
|
||||
login[Login & Signup]
|
||||
view[Views Renderer]
|
||||
frontend --> browse
|
||||
frontend --> login
|
||||
end
|
||||
|
||||
subgraph Users and Permissions
|
||||
user[User]
|
||||
permissions[Permissions]
|
||||
authapi[Auth API]
|
||||
authzapi[Authorization API]
|
||||
login --> authapi
|
||||
authapi --> user
|
||||
authzapi --> permissions
|
||||
end
|
||||
|
||||
subgraph PkgStore
|
||||
bitstore["PkgStore (S3)"]
|
||||
bitstoreapi[PkgStore API<br/>put,get]
|
||||
bitstoreapi --> bitstore
|
||||
browse --> bitstoreapi
|
||||
end
|
||||
|
||||
subgraph MetaStore
|
||||
metastore["MetaStore (ElasticSearch)"]
|
||||
metaapi[MetaStore API<br/>read,search,import]
|
||||
metaapi --> metastore
|
||||
browse --> metaapi
|
||||
end
|
||||
|
||||
subgraph CLI
|
||||
cli[CLI]
|
||||
end
|
||||
```
|
||||
|
||||
## Information Architecture
|
||||
|
||||
```
|
||||
datahub.io # frontend
|
||||
api.datahub.io # API - see API page for structure
|
||||
rawstore.datahub.io # rawstore - raw bitstore
|
||||
pkgstore.datahub.io # pkgstore - package bitstore
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### Frontend Web Application
|
||||
|
||||
Core part of platform - Login & Sign-Up and Browse & Search Datasets
|
||||
|
||||
https://github.com/datahq/frontend
|
||||
|
||||
#### Views and Renderer
|
||||
|
||||
JS Library responsible for visualization and views.
|
||||
|
||||
See [views][] section for more about Views.
|
||||
|
||||
### Assembler
|
||||
|
||||
TODO
|
||||
|
||||
### Raw Storage
|
||||
|
||||
We first save all raw files before sending to pipeline-runner.
|
||||
**Pipeline-runner** is a service that runs the data package pipelines. It is used to normalise and modify the data before it is displayed publicly
|
||||
|
||||
- We use AWS S3 instance for storing data
|
||||
|
||||
### Package Storage
|
||||
|
||||
We store files after passing pipeline-runner
|
||||
|
||||
- We use AWS S3 instance for storing data
|
||||
|
||||
### BitStore
|
||||
|
||||
We are preserving the data byte by byte.
|
||||
|
||||
- We use AWS S3 instance for storing data
|
||||
|
||||
### MetaStore
|
||||
|
||||
The MetaStore provides an integrated, searchable view over key metadata for end user services and users. Initially this metadata will just be metadata on datasets in the Package Store. In future it may expand to provide a unified to include other related metadata such as pipelines. It also includes summary metadata (or the ability to compute summary data) e.g. the total size of all your packages
|
||||
|
||||
#### Service architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
subgraph MetaStore
|
||||
metaapi[MetaStore API]
|
||||
metadb[MetaStore DB fa:fa-database]
|
||||
end
|
||||
|
||||
metadb --> metaapi
|
||||
|
||||
assembler[Assembler] --should this by via api or direct to DB??--> metadb
|
||||
|
||||
metaapi --> frontend[Frontend fa:fa-user]
|
||||
metaapi --> cli[CLI fa:fa-user]
|
||||
|
||||
frontend -.no dp stuff only access.-> metaapi
|
||||
```
|
||||
|
||||
### Command Line Interface
|
||||
|
||||
The command line interface.
|
||||
|
||||
https://github.com/datahq/datahub-cli
|
||||
|
||||
[views]: /docs/dms/views
|
||||
[web-app]: http://datahub.io/
|
||||
|
||||
## Domain model
|
||||
|
||||
There are two main concepts to understand in DataHub domain model - [Profile](#profile) and [Package](#data-package)
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
pkg[Data Package]
|
||||
resource[Resource]
|
||||
file[File]
|
||||
version[Version]
|
||||
user[User]
|
||||
publisher[Publisher]
|
||||
|
||||
subgraph Package
|
||||
pkg --0..*--> resource
|
||||
resource --1..*--> file
|
||||
pkg --> version
|
||||
end
|
||||
|
||||
subgraph Profile
|
||||
publisher --1..*--> user
|
||||
publisher --0..*--> pkg
|
||||
end
|
||||
```
|
||||
|
||||
### Profile
|
||||
|
||||
Set of an authenticated and authorized entities like publishers and users. They are responsible for publishing, deleting or maintaining data on platform.
|
||||
|
||||
**Important:** Users do not have Data Packages, Publishers do. Users are *members* of Publishers.
|
||||
|
||||
#### Publisher
|
||||
|
||||
Publisher is an organization which "owns" Data Packages. Publisher may have zero or more Data Packages. Publisher may also have one or more user.
|
||||
|
||||
#### User
|
||||
|
||||
User is an authenticated entity, that is member of Publisher organization, that can read, edit, create or delete data packages depending on their permissions.
|
||||
|
||||
#### Package
|
||||
|
||||
A Data Package is a simple way of “packaging” up and describing data so that it can be easily shared and used. You can imagine as collection of data and and it's meta-data ([datapackage.json][datapackage.json]), usually covering some concrete topic Eg: *"Gold Prices"* or *"Population Growth Rate In My country"* etc.
|
||||
|
||||
Each Data Package may have zero or more resources and one or more versions.
|
||||
|
||||
**Resources** - think like "tables" - Each can map to one or more physical files (usually just one). Think of a data table split into multiple CSV files on disk.
|
||||
|
||||
**Version of a Data Package** - similar to git commits and tags. People can mean different things by a "Version":
|
||||
|
||||
* Tag - Same as label or version - a nice human usable label e.g. *"v0.3"*, *"master"*, *"2013"*
|
||||
* Commit/Hash - Corresponds to the hash of datapackage.json, with that datapackage.json including all hashes of all data files
|
||||
|
||||
We interpret Version as *"Tag"* concept. *"Commit/Hash"* is not supported
|
||||
|
||||
[datapackage.json]: http://frictionlessdata.io/guides/data-package/#datapackagejson
|
||||
107
site/content/docs/dms/datahub/developers/publish.md
Normal file
107
site/content/docs/dms/datahub/developers/publish.md
Normal file
@@ -0,0 +1,107 @@
|
||||
# Publish
|
||||
|
||||
Explanation of DataHub publishing flow from client and back-end perspectives.
|
||||
|
||||
```mermaid
|
||||
|
||||
graph TD
|
||||
|
||||
|
||||
cli((CLI fa:fa-user))
|
||||
auth[Auth Service]
|
||||
cli --login--> auth
|
||||
|
||||
|
||||
cli --store--> raw[Raw Store API<br>+ Storage]
|
||||
|
||||
cli --package-info--> pipeline-store
|
||||
raw --data resource--> pipeline-runner
|
||||
|
||||
pipeline-store -.generate.-> pipeline-runner
|
||||
|
||||
pipeline-runner --> package[Package Storage]
|
||||
package --api--> frontend[Frontend]
|
||||
frontend --> user[User fa:fa-user]
|
||||
|
||||
|
||||
|
||||
package -.publish.->metastore[MetaStore]
|
||||
pipeline-store -.publish.-> metastore[MetaStore]
|
||||
metastore[MetaStore] --api--> frontend
|
||||
|
||||
```
|
||||
|
||||
## Diagram for upload process
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
CLI --jwt--> rawstore[RawStore API]
|
||||
rawstore --signed urls--> CLI
|
||||
CLI --upload using signed url--> s3[S3 bucket]
|
||||
s3 --success message--> CLI
|
||||
CLI --metadata--> pipe[Pipe Source]
|
||||
```
|
||||
|
||||
## Identity Pipeline
|
||||
|
||||
**Context: where this pipeline fits in the system**
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
specstore --shared db--> assembler
|
||||
assembler --identity pipeline--> pkgstore
|
||||
pkgstore --> frontend
|
||||
```
|
||||
|
||||
**Detailed steps**
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
load[Load from RawStore] --> encoding[Encoding Check<br>Add encoding info]
|
||||
encoding --> csvkind[CSV kind check]
|
||||
csvkind --> validate[Validate data]
|
||||
validate --> dump[Dump S3]
|
||||
dump --> pkgstore[Pkg Store fa:fa-database]
|
||||
load -.-> dump
|
||||
validate --> checkoutput[Validation<br>Reports]
|
||||
```
|
||||
|
||||
|
||||
## Client Perspective
|
||||
|
||||
Publishing flow takes the following steps and processes to communicate with DataHub API:
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
Upload Agent CLI->>Upload Agent CLI: Check Data Package valid
|
||||
Upload Agent CLI-->>Auth(SSO): login
|
||||
Auth(SSO)-->>Upload Agent CLI: JWT token
|
||||
Upload Agent CLI->>RawStore API: upload using signed url
|
||||
RawStore API->>Auth(SSO): Check key / token
|
||||
Auth(SSO)->>RawStore API: OK / Not OK
|
||||
RawStore API->>Upload Agent CLI: success message
|
||||
Upload Agent CLI->>pipeline store: package info
|
||||
pipeline store->>Upload Agent CLI: OK / Not OK
|
||||
pipeline store->>pipeline runner: generate
|
||||
RawStore API->>pipeline runner: data resource
|
||||
pipeline runner->>Package Storage: generated
|
||||
Package Storage->>Metadata Storage API: publish
|
||||
pipeline store->>Metadata Storage API: publish
|
||||
Metadata Storage API->>Upload Agent CLI: OK / Not OK
|
||||
```
|
||||
<br/>
|
||||
|
||||
* Upload API - see `POST /source/upload` in *source* section of [API][api]
|
||||
* Authentication API - see `GET /auth/check` in *auth* section of [API][api].
|
||||
* Authorization API - see `GET /auth/authorize` in *auth* section of [API][api].
|
||||
|
||||
See example [code snippet in DataHub CLI][publish-code]
|
||||
|
||||
[api]: /docs/dms/datahub/developers/api
|
||||
[publish-code]: https://github.com/datahq/datahub-cli/blob/b869d38073248903a944029cf93eddf3ef50001a/bin/data-push.js#L34
|
||||
|
||||
[api]: /docs/dms/datahub/developers/api
|
||||
|
||||
811
site/content/docs/dms/datahub/developers/user-stories.md
Normal file
811
site/content/docs/dms/datahub/developers/user-stories.md
Normal file
@@ -0,0 +1,811 @@
|
||||
# User Stories
|
||||
|
||||
DataHub is the place where *people* can **store, share and publish** their data, **collect, inspect and process** it with **powerful tools**, and **discover and use** data shared by others. [order matters]
|
||||
|
||||
People = data wranglers = those who use machines (e.g. code, command line tools) to work with their data rather than editing it by hand (as, for example, many analysts do in Excel). (Think people who use Python vs people who use Excel for data work)
|
||||
|
||||
* Data is not chaotic and is in some sense neat
|
||||
* Can present your data with various visualization tools (graphs, charts, tables etc.)
|
||||
* Easy to publish
|
||||
* Specific data (power) tools and integrations
|
||||
* Can validate your data before publishing
|
||||
* Data API
|
||||
* Data Conversion / Bundling: zip the data, provide sqlite
|
||||
* Generate a node package of your data
|
||||
* (Versioning)
|
||||
|
||||
## Personas
|
||||
|
||||
* **[Geek] Publisher**. Knows how to use a command line or other automated tooling. Wants to publish their data package in order to satisfy their teams requirements to publish data.
|
||||
* Non-Geek Publisher. Tbc …
|
||||
* **Consumer**: A person or organization looking to use data packages (or data in general)
|
||||
* Data Analyst
|
||||
* Coder (of data driven applications)
|
||||
* …
|
||||
* **Admin**: A person or organization who runs an instance of a DataHub
|
||||
|
||||
## Stories v2
|
||||
|
||||
### Publishing data
|
||||
|
||||
As a Publisher I want to publish a file/dataset and view/share just with a few people (or even just myself)
|
||||
|
||||
* ~~"Private" link: {'/{username}/{uuid}'}~~
|
||||
* I want JSON as well as CSV versions of my data
|
||||
* I want a preview
|
||||
* I want to be notified clearly if something went wrong and what I can do to fix it.
|
||||
|
||||
As a Publisher I want to publish a file/dataset and share publicly with everyone
|
||||
|
||||
* Viewable on my profile
|
||||
* Public link: nice urls {'/{username}/{dataset-name}'}
|
||||
|
||||
For the pipeline =>
|
||||
|
||||
**Context: where this pipeline fits in the system**
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
specstore --shared db--> assembler
|
||||
assembler --identity pipeline--> pkgstore
|
||||
pkgstore --> frontend
|
||||
```
|
||||
|
||||
**Detailed steps**
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
|
||||
load[Load from RawStore] --> encoding[Encoding Check<br>Add encoding info]
|
||||
encoding --> csvkind[CSV kind check]
|
||||
csvkind --> validate[Validate data]
|
||||
validate --> dump[Dump S3]
|
||||
dump --> pkgstore[Pkg Store fa:fa-database]
|
||||
|
||||
load -.-> dump
|
||||
|
||||
validate --> checkoutput[Validation<br>Reports]
|
||||
```
|
||||
|
||||
### Push Package
|
||||
|
||||
#### Diagram for upload process
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
|
||||
CLI --jwt--> rawstore[RawStore API]
|
||||
rawstore --signed urls--> CLI
|
||||
CLI --upload using signed url--> s3[S3 bucket]
|
||||
s3 --success message--> CLI
|
||||
CLI --metadata--> pipe[Pipe Source]
|
||||
```
|
||||
|
||||
### Push File
|
||||
|
||||
Levels:
|
||||
|
||||
0. Already have Data Package (?)
|
||||
1. Good CSV
|
||||
2. Good Excel
|
||||
3. Bad data (i.e. has ...)
|
||||
3. Something else
|
||||
|
||||
```
|
||||
data push {file-or-directory}
|
||||
```
|
||||
|
||||
How does data push work?
|
||||
|
||||
```
|
||||
# you are pushing the raw file
|
||||
# and the extraction to get one or more data tables ...
|
||||
# in the background we are creating a data package + pipeline
|
||||
data push {file}
|
||||
|
||||
Algorithm:
|
||||
|
||||
1. Detect type / format
|
||||
2. Choose the data (e.g. sheet from excel)
|
||||
3. Review the headers
|
||||
4. Infer data-types and review
|
||||
5. [Add constraints]
|
||||
6. Data validation
|
||||
7. Upload
|
||||
8. Get back a link - view page (or the raw url) e.g. http://datapackaged.com/core/finance-vix
|
||||
* You can view, share, publish, [fork]
|
||||
|
||||
1. Detect file type
|
||||
=> file extension
|
||||
1. Offer guess
|
||||
2. Probable guess (options?)
|
||||
3. Unknown - tell us
|
||||
|
||||
1B. Detect encoding (for CSV)
|
||||
|
||||
2. Choose the data
|
||||
1. 1 sheet => ok
|
||||
2. Multiple sheets guess and offer
|
||||
3. Multiple sheets - ask them (which to include)
|
||||
|
||||
2B: bad data case - e.g. selecting within table
|
||||
|
||||
3. Review the headers
|
||||
* Here is what we found
|
||||
* More than one option for headers - try to reconcile
|
||||
*
|
||||
|
||||
|
||||
### Upload:
|
||||
|
||||
* raw file with name a function of the md5 hash
|
||||
* Pros: efficient on space (e.g. same file stored once but means you need to worry about garbage collection?)
|
||||
* the pipeline description: description of data and everything else we did [into database]
|
||||
|
||||
Then pipeline runs e.g. load into a database or into a data package
|
||||
|
||||
* stores output somewhere ...
|
||||
|
||||
Viewable online ...
|
||||
|
||||
Note:
|
||||
data push url # does not store file
|
||||
data push file # store in rawstore
|
||||
|
||||
### BitStore
|
||||
|
||||
/rawstore/ - content addressed storage (md5 or sha hashed)
|
||||
/packages/{owner}/{name}/{tag-or-pipeline}
|
||||
```
|
||||
|
||||
|
||||
Try this for a CSV file
|
||||
|
||||
```
|
||||
data push mydata.csv
|
||||
|
||||
# review headers
|
||||
|
||||
# data types ...
|
||||
|
||||
|
||||
Upload
|
||||
|
||||
* csv file gets stored as blob md5 ...
|
||||
* output of the pipeline stored ...
|
||||
* canonical CSV gets generated ...
|
||||
```
|
||||
|
||||
Data Push directory
|
||||
|
||||
```
|
||||
data push {directory}
|
||||
|
||||
# could just do data push file for each file but ...
|
||||
# that could be tedious
|
||||
# once I've mapped one file you try reusing that mapping for others ...
|
||||
# .data directory that stores the pipeline and the datapackage.json
|
||||
```
|
||||
|
||||
|
||||
## Stories
|
||||
|
||||
### 1. Get Started
|
||||
|
||||
#### 1. Sign in / Sign up [DONE]
|
||||
|
||||
As a Geek Publisher I want to sign up for an account so that I can publish my data package to the registry and to have a publisher account to publish my data package under.
|
||||
|
||||
*Generally want this to be as minimal, easy and quick as possible*
|
||||
|
||||
* Sign in with a Google account
|
||||
* (?) what about up other social accounts?
|
||||
* Essential profile information (after sign in we prompt for this)
|
||||
* email address
|
||||
* Name
|
||||
* (?) - future. Credit card details for payment - can we integrate with payment system (?)
|
||||
* They need to choose a user name which is url friendly unique human readable name for our app. Can be used in sign in and in many other places.
|
||||
* WHY? Where would we need this? For url on site & for publisher
|
||||
* Same as publisher names (needed for URLs): [a-z-_.]
|
||||
* Explain: they cannot change this later e.g. "Choose wisely! Once you set this it cannot be changed later!"
|
||||
* Send the user an email confirming their account is set up and suggesting next steps
|
||||
|
||||
Automatically:
|
||||
|
||||
* Auto-create a publisher for them
|
||||
* Same name as their user name but a publisher
|
||||
* That way they can start publishing straight away …
|
||||
|
||||
**TODO: (??) should we do *all* of this via the command line client (a la npmjs) **
|
||||
|
||||
#### Sign up via github (and/or google) [DONE]
|
||||
|
||||
As a Visitor I want to sign up via github or google so that I don’t have to enter lots of information and remember my password for yet another website
|
||||
|
||||
* How do we deal with username conflicts? What about publisher name conflicts?
|
||||
* This does not arise in simple username system because we have only pool of usernames
|
||||
|
||||
#### Next Step after Sign Up
|
||||
|
||||
As a Geek Publisher I want to know what do next after signing up so that I can get going quickly.
|
||||
|
||||
Things to do:
|
||||
|
||||
* Edit your profile
|
||||
* Download a client / Configure your client (if you have one already)
|
||||
* Instructions on getting relevant auth credentials
|
||||
* Note they will *need* to have set a username / password in their profile
|
||||
* Join a Publisher (understand what a publisher is!)
|
||||
|
||||
#### Invite User to Join Platform
|
||||
|
||||
As an Admin (or existing Registered User?) I want to invite someone to join the platform so that they can start contributing or using data
|
||||
|
||||
* Get an invitation email with a sign up link
|
||||
* *Some commonality with Publisher invite member below*
|
||||
|
||||
### 2. Publish Data Packages
|
||||
|
||||
#### Publish with a Client [DONE]
|
||||
|
||||
As a Geek Publisher I want to import (publish) my data package into the registry so my data has a permanent online home so that I and others can have access
|
||||
|
||||
On command line looks like:
|
||||
|
||||
```
|
||||
$ cd my/data/package
|
||||
$ data publish
|
||||
|
||||
> … working …
|
||||
>
|
||||
> SUCCESS
|
||||
```
|
||||
|
||||
Notes
|
||||
|
||||
* Permissions: must be a member of the Publisher
|
||||
* Internally: DataPackageCreate or DataPackageUpdate capability
|
||||
* Handle conflicts: if data package already exists, return 409. Client instructions should be already exists and use "--force" or similar to overwrite
|
||||
* API endpoint behind the scenes: POST {'{api}/package/'}
|
||||
* TODO: private data packages
|
||||
* And payment!
|
||||
|
||||
##### Configure Client [DONE]
|
||||
|
||||
As a Geek Publisher I want to configure my client so I can start publishing data packages.
|
||||
|
||||
Locally in $HOME store store something like:
|
||||
|
||||
```
|
||||
.dpm/credentials # stores your API key and user name
|
||||
|
||||
.dpm/config # stores info like your default publisher
|
||||
```
|
||||
|
||||
#### Update a Data Package [DONE]
|
||||
|
||||
As a Geek Publisher I want to use a publish command to update a data package that is already in the registry so it appears there
|
||||
|
||||
* Old version will be lost (!)
|
||||
|
||||
#### Delete a Data Package
|
||||
|
||||
As a Geek Publisher I want to unpublish (delete) a data package so it is no longer visible to anyone
|
||||
|
||||
#### Purge a Data Package
|
||||
|
||||
As a Geek Publisher I want to permanently delete (purge) a data package so that it no longer takes up storage space
|
||||
|
||||
|
||||
#### Validate Data in Data Package
|
||||
|
||||
##### Validate in CLI [DONE]
|
||||
|
||||
As a Publisher [owner/member] I want to validate the data I am about to publish to the registry so that I publish “good” data and know that I am doing so and do not have to manually check that the published data looks ok (e.g. rendering charts properly) (and if wrong I have to re-upload)
|
||||
|
||||
```
|
||||
data datavalidate [file-path]
|
||||
```
|
||||
|
||||
* [file-path] - run this against a given file. Look in the resources to see if this file is there and if so use the schema. Otherwise just do goodtables table …
|
||||
* If no file provided run validate against each resource in turn in the datapackage
|
||||
* Output to stdout.
|
||||
* Default: human-readable - nice version of output from goodtables.
|
||||
* Option for JSON e.g. --json to put machine readable output
|
||||
* check goodtables command line tool and follow if possible. Can probably reuse code
|
||||
* Auto-run this before publish unless explicit suppression (e.g. --skip-datavalidate)
|
||||
* Use goodtables (?)
|
||||
|
||||
##### Validate on Server
|
||||
|
||||
As a Publisher [owner] i want my data to be validated when I publish it so that I know immediately if I have accidentally “broken” my data or have bugs and can take action to correct
|
||||
|
||||
As a Consumer I want to know that the data I am downloading is “good” and can be relied on so that I don’t have to check it myself or run into annoying bugs later on
|
||||
|
||||
* Implies showing something in the UI e.g. “Data Valid” (like build passing)
|
||||
|
||||
**Implementation notes to self**
|
||||
|
||||
* Need a new table to store results of validation and a concept of a “run”
|
||||
* Store details of the run [e.g. time to complete, ]
|
||||
* How to automate doing validation (using goodtables we assume) - do we reuse a separate service (goodtables.io in some way) or run ourselves in a process like ECS ???
|
||||
* Display this in frontend
|
||||
|
||||
#### Cache Data Package Resource data (on the server)
|
||||
|
||||
As a Publisher I want to publish a data package where its resource data is stored on my servers but the registry caches a copy of that data so that if my data is lost or gets broken I still have a copy people can use
|
||||
|
||||
As a Consumer I want to be able to get the data for a data package even if the original data has been moved or removed so that I can still use is and my app or analysis keeps working
|
||||
|
||||
* TODO: what does this mean for the UI or command line tools. How does the CLI know about this, how does it use it?
|
||||
|
||||
#### Publish with Web Interface
|
||||
|
||||
As a Publisher I want to publish a data package in the UI so that it is available and published
|
||||
|
||||
* Publish => they already have datapackage.json and all the data. They just want to be able to upload and store this.
|
||||
|
||||
As a Publisher I want to create a data package in the UI so that it is available and published
|
||||
|
||||
* Create => no datapackage.json - just data files. Need to add key descriptors information, upload data files and have schemas created etc etc.
|
||||
|
||||
#### Undelete data package
|
||||
|
||||
[cli] As a Publisher I want to be able to restore the deleted data package via cli, so that it is back visible and available to view, download (and searchable)
|
||||
|
||||
```
|
||||
dpmpy undelete
|
||||
```
|
||||
|
||||
[webui] As a Publisher i want to undelete the deleted data packages, so that the deleted data packages is now visible again.
|
||||
|
||||
#### Render (views) in data package in CLI before upload
|
||||
|
||||
As a Publisher, I want to be able to preview the views (graphs and table (?)) of the current data package using cli prior to publishing so that I can refine the json declarations of datapackage view section to achieve a great looking result.
|
||||
|
||||
### 3. Find and View Data Packages
|
||||
|
||||
#### View a Data Package Online [DONE]
|
||||
|
||||
**EPIC: As a Consumer I want to view a data package online so I can get a sense of whether this is the dataset I want**
|
||||
|
||||
* *Obsess here about “whether this is the dataset I want”*
|
||||
* *Publishers want this too … *
|
||||
* *Also important for SEO if we have good info here*
|
||||
|
||||
Features
|
||||
|
||||
* Visualize data in charts - gives one an immediate sense of what this is
|
||||
* One graph section at top of page after README excerpt
|
||||
* One graph for each entry in the “views”
|
||||
* Interactive table - allows me to see what is in the table
|
||||
* One table for each resource
|
||||
|
||||
This user story can be viewed from two perspectives:
|
||||
|
||||
* From a publisher point of view
|
||||
* From a consumer point of view
|
||||
|
||||
As a **publisher** i want to show the world how my published data is so that it immediately catches consumer’s attention (and so I know it looks right - e.g. graph is ok)
|
||||
|
||||
As a **consumer** i want to view the data package so that i can get a sense of whether i want this dataset or not.
|
||||
|
||||
Acceptance criteria - what does done mean!
|
||||
|
||||
* A table for each resource
|
||||
* Simple graph spec works => converts to plotly
|
||||
* Multiple time series
|
||||
* Plotly spec graphs work
|
||||
* All core graphs work (not sure how to check every one but representative ones)
|
||||
* Recline graphs specs (are handled - temporary basis)
|
||||
* Loading spinners whilst data is loading so users know what is happening
|
||||
|
||||
Bonus:
|
||||
|
||||
* Complex examples e.g. time series with a log scale … (e.g. hard drive data …)
|
||||
|
||||
Features: [*DP view status*](fonts/Alegreya-regular.ttf)
|
||||
|
||||
* Different options to view data as graph.
|
||||
* Recline
|
||||
* Vega-lite
|
||||
* Vega
|
||||
* [Plotly]
|
||||
* General Functionality
|
||||
* Multiple views [wrongly done. We iterate over resource not views]
|
||||
* Table as a view
|
||||
* Interactive table so that consumer can do
|
||||
* Filter
|
||||
* Join
|
||||
|
||||
|
||||
#### (Pre)View a not-yet-published Data Package Online
|
||||
|
||||
As a (potential) Publisher I want to preview a datapackage I have prepared so that I can check it works and share the results (if there is something wrong with others)
|
||||
|
||||
* Be able to supply a URL to my datapackage (e.g. on github) and have it previewed as it would look on DPR
|
||||
* Be able to upload a datapackage and have it previewed
|
||||
|
||||
*Rufus: this was a very common use case for me (and others) when using data.okfn.org. Possibly less relevant if the command line tool can do previewing but still relevant IMO (some people may not have command line tool, and it is useful to be able to share a link e.g. when doing core datasets curation and there is something wrong with a datapackage).*
|
||||
|
||||
*Rufus: also need for an online validation tool*
|
||||
|
||||
#### See How Much a Data Package is Used (Downloaded) {'{2d}'}
|
||||
|
||||
As a Consumer i want to see how much the data has been downloaded so that i can choose most popular (=> probably most reliable and complete) in the case when there are several alternatives for my usecase (maybe from different publishers)
|
||||
|
||||
#### Browse Data Packages [DONE]
|
||||
|
||||
As a potential Publisher, unaware of datapackages, I want to see real examples of published packages (with the contents datapackage.json), so that I can understand how useful and simple is the datapackage format and the registry itself.
|
||||
|
||||
As a Consumer I want to see some example data packages quickly so I get a sense of what is on this site and if it is useful to look further
|
||||
|
||||
* Browse based on what properties? Most recent, most downloaded?
|
||||
* Most downloaded
|
||||
* Start with: we could just go with core data packages
|
||||
|
||||
#### Search for Data Packages [DONE]
|
||||
|
||||
As a Consumer I want to search data packages so that I can find the ones I want
|
||||
|
||||
* Essential question: what is it you want?
|
||||
* Rufus: in my view generic search is actually *not* important to start with. People do not want to randomly search. More useful is to go via a publisher at the beginning.
|
||||
* Search results should provide enough information to help a user decide whether to dig further e.g. title, short description
|
||||
* For future when we have it: [number of downloads], stars etc
|
||||
* Minimum viable search (based on implementation questions)
|
||||
* Filter by publisher
|
||||
* Free text search against title
|
||||
* Description could be added if we start doing actual scoring as easy to add additional fields
|
||||
* Scoring would be nice but not essential
|
||||
|
||||
* Implementation questions:
|
||||
* Search:
|
||||
* Should search perform ranking (that requires scoring support)
|
||||
* Free text queries should search against which fields (with what weighting)?
|
||||
* Filtering: On what individual properties of the data package should be able to filter?
|
||||
* Themes and profiles:
|
||||
* Searching for a given profile: not possible atm.
|
||||
* Themes: Should we tag data packages by themes like finance, education and let user find data package by that?
|
||||
* Maybe but not now - maybe in the future
|
||||
* If we follow the go via a publisher at the beginning then should we list the most popular publisher on the home page of user[logged-in/ not logged in]?
|
||||
* If most popular publisher then by what mesaure?
|
||||
* Sort by Most published?
|
||||
* Sort by Most followers?
|
||||
* Sort by most downloads?
|
||||
* Or all show top5 in each facet?
|
||||
|
||||
Sub user stories:
|
||||
|
||||
> *[WONTFIX?] As a Consumer i want to find the data packages by profile (ex: spending) so that I can find the kind of data I want quickly and easily and in one big list*
|
||||
>
|
||||
> *As a Consumer i want to search based on description of data package, so that I can find package which related to some key words*
|
||||
|
||||
#### Download Data Package Descriptor
|
||||
|
||||
As a Consumer I want to download the data package descriptor (datapackage.json) on its own so that …
|
||||
|
||||
*Rufus: I can’t understand why anyone would want to do this *
|
||||
|
||||
#### Download Data Package in One File (e.g. zip)
|
||||
|
||||
As a Consumer I want to download the data package in one file so that I don’t have to download descriptor and each resource by hand
|
||||
|
||||
*Only useful if no cli tool and no install command*
|
||||
|
||||
### 4. Get a Data Package (locally)
|
||||
|
||||
Let’s move discussion to the github: *https://github.com/frictionlessdata/dpm-py/issues/30
|
||||
|
||||
*TODO add these details from the requirement doc*
|
||||
|
||||
* *Local “Data Package” cache storage (`.datapackages` or similar)*
|
||||
* *Stores copies of packages from Registry*
|
||||
* *Stores new Data Packages the user has created*
|
||||
* *This* [**Ruby lib**](fonts/Lato-boldItalic.ttf) *implements something similar*
|
||||
|
||||
#### Use DataPackage in Node (package auto-generated)
|
||||
|
||||
As a NodeJS developer I want to use data package as a node lib in my project so that I can depend on it using my normal dependency framework
|
||||
|
||||
* See this [*real-world example*](fonts/SourceSansPro-regular.ttf) of this request for country-list
|
||||
* => auto-building node package and publishing to npm (not that hard to do …)
|
||||
* Convert CSV data to json (that’s what you probably want from node?)
|
||||
* Generate package.json
|
||||
* Push to npm (register the dataset users)
|
||||
* Rufus: My guess here is that to implement here we want something a bit like github integrations – specific additional hooks which also get some configuration (or do it like travis - github integration plus a tiny config file - in our case rather than a .travis.yml we have a .node.yml or whatever)
|
||||
* Is it configurable for user that enable to push to npm or not?
|
||||
* Yes. Since we need to push to a specific npm user (for each publisher) this will need to be configured (along with authorization - where does that go?)
|
||||
* Is this something done for *all* data packages or does user need to turn something on? Probably want them to turn this on …
|
||||
|
||||
Questions:
|
||||
|
||||
* From where we should push the data package to npm repo.
|
||||
* Is it from dpmpy or from server? Obviously from a server - this needs to be automated. But you can use dpmpy if you want (but I’m not sure we do want to …)
|
||||
* What to do with multiple resources? Ans: include all resources
|
||||
* Do we include datapackage.json into the node package? Yes, include it so they get all the original metadata.
|
||||
|
||||
*Generic version is:*
|
||||
|
||||
*As a Web Developer I want to download a DataPackage (like currency codes or country names) so that I can use it in the web service I am building [...]*
|
||||
|
||||
#### Import DataPackage into R [DONE?]
|
||||
|
||||
As a Consumer [R user] I want to load a Data Package from R so that I can immediately start playing with it
|
||||
|
||||
* Should we try and publish to CRAN?
|
||||
* Probably not? Why? think it can be quite painful getting permission to publish to CRAN and very easy to load from the registry
|
||||
* On the CRAN website I can't find a way to automate publishing. It seems possible by filling web-form, but to know the status we have to wait and parse email.
|
||||
* Using this library: https://github.com/ropenscilabs/datapkg
|
||||
* Where can i know about this?
|
||||
* On each data package view page …
|
||||
|
||||
*Generic version:*
|
||||
|
||||
*As a Data Analyst I want to download a data package, so that I can study it and wrangle with it to infer new data or generate new insights.*
|
||||
|
||||
*As a Data Analyst, I want to update previously downloaded data package, so that I can work with the most recent data.*
|
||||
|
||||
#### Import DataPackage into Pandas [DONE?]
|
||||
|
||||
TODO - like R
|
||||
|
||||
#### SQL / SQLite database
|
||||
|
||||
As a Consumer I want to download a DataPackage’s data one coherent SQLite database so that I can get it easily in one form
|
||||
|
||||
Question:
|
||||
|
||||
* Why does we need to store datapackage data in sqlite. Is not it better to store in file structure?
|
||||
|
||||
We can store the datapackage like this way:
|
||||
|
||||
```
|
||||
~/.datapackage/<publisher>/<package>/<version>/*
|
||||
```
|
||||
|
||||
This is the way maven/gradle/ivy cache jar locally.
|
||||
|
||||
#### See changes between versions
|
||||
|
||||
As a Data Analyst I want to compare different versions of some datapackage locally, so that I can see schema changes clearly and adjust my analytics code to the desired schema version.
|
||||
|
||||
#### Low Priority
|
||||
|
||||
As a Web Developer of multiple projects, I want to be able to install multiple versions of the same datapackage separately so that all my projects could be developed independently and deployed locally. (virtualenv-like)
|
||||
|
||||
As a Developer I want to list all DataPackages requirements for my project in the file and pin the exact versions of any DataPackage that my project depends on so that the project can be deterministically deployed locally and won’t break because of the DataPackage schema changes. (requirements.txt-like)
|
||||
|
||||
### 5. Versioning and Changes in Data Packages
|
||||
|
||||
When we talk about versioning we can mean two things:
|
||||
|
||||
* Explicit versioning: this is like the versioning Of releases “v1.0” etc. This is conscious and explicit. Main purpose:
|
||||
* to support other systems depending on this one (they want the data at a known stable state)
|
||||
* easy access to major staging points in the evolution (e.g. i want to see how things were at v1)
|
||||
* Implicit versioning or “revisioning”: this is like the commits in git or the autosave of a word or google doc. It happens frequently, either with minimum effort or even automatically. Main purpose:
|
||||
* Undelete and recovery (you save a every point and can recover if you accidentally write or delete something)
|
||||
* Collaboration and merging of changes (in revision control)
|
||||
* Activity logging
|
||||
|
||||
#### Explicit Versioning - Publisher
|
||||
|
||||
As a Publisher I want to tag a version of my data on the command line so that … [see so that’s below]
|
||||
|
||||
dpmpy tag {'{tag-name}'}
|
||||
|
||||
=> tag current “latest” on the server as {'{tag-name}'}
|
||||
|
||||
* Do we restrict {'{tag-name}'} to semver? I don’t think so atm.
|
||||
* As a {'{Publisher}'} I want to tag datapackage to create a snapshot of data on the registry server, so that consumers can refer to it
|
||||
* As a {'{Publisher}'} I want to be warned that a tag exists, when I try to overwrite it, so that I don’t accidentally overwrite stable tagged data, which is relied on by consumers.
|
||||
* As a {'{Publisher}'} I want to be able to overwrite the previously tagged datapackage, so that I can fix it if I mess up.
|
||||
* The versioning here happens server side
|
||||
* Is this confusing for users? I.e. they are doing something local.
|
||||
|
||||
Background “so that” user story epics:
|
||||
|
||||
* As a {'{Publisher}'} I want to version my Data Package and keep multiple versions around including older versions so that I do not break consumer systems when I change my Data Package (whether schema or data) [It is not just the publisher who wants this, it is a consumer - see below]
|
||||
* As a {'{Publisher}'} I want to be able to get access to a previous version I tagged so that I can return to it and review it (and use it)
|
||||
* so that i can recover old data if i delete it myself or compare how things changed over time
|
||||
|
||||
#### Explicit Versioning - Consumer
|
||||
|
||||
As a {'{Consumer}'} (of a Data Package) I want to know full details when and how the data package schema has changed and when so that I can adjust my scripts to handle it.
|
||||
|
||||
Important info to know for each schema change:
|
||||
|
||||
* time when published
|
||||
* for any ***changed*** field - name, what was changed (type, format, …?),
|
||||
> +maybe everything else that was not changed (full field descriptor)
|
||||
* for any ***deleted*** field - name,
|
||||
> +maybe everything else (full field descriptor)
|
||||
* for any ***added*** field - all data (full field descriptor)
|
||||
|
||||
*A change in schema would correspond to a major version change in software (see http://semver.org/)
|
||||
|
||||
***Concerns about explicit versioning**: we all have experience with consuming data from e.g. government publishers where the publishers change the data schema breaking client code. I am constatnly looking for a policy/mechanism to guide publishers to develop stable schema versioning for the data they produce, and help consumers to get some stability guarantees.*
|
||||
|
||||
***Automated versioning / automated tracking**: Explicit versioning relies on the publisher, and humans can forget or not care enough about others. So to help consumers my suggestion would be to always track schema changes of uploaded packages on the server, and allow users to review those changes on the website. (We might even want to implement auto-tagging or not allowing users to upload a package with the same version but a different schema without forcing)*
|
||||
|
||||
As a {'{Consumer}'} I want to get a sense how outdated is the datapackage, that I have downloaded before, so that I can decide if I should update or not.
|
||||
|
||||
* I want to preview a DataPackage changelog (list of all available versions/tags with brief info) online, sorted by creation time, so that I can get a sense how data or schema has changed since some time in the past. Important brief info:
|
||||
* Time when published
|
||||
* How many rows added/deleted for each resource data
|
||||
* What fields(column names) changed, added or deleted for each resource.
|
||||
|
||||
As a {'{Consumer}'} I want to view a Datapackage at a particular version online, so that I can present/discuss the particular data timeslice of interest with other people.
|
||||
|
||||
As a {'{Consumer}'} I want to download a Data package at a particular version so that I know it is compatible with my scripts and system
|
||||
|
||||
* Online: I want to pick the version I want from the list, and download it (as zip for ex.)
|
||||
* CLI: I want to specify tag or version when using the `install` command.
|
||||
|
||||
##### Know when a package has changed re caching
|
||||
|
||||
Excerpted from: https://github.com/okfn/data.okfn.org-new/issues/7
|
||||
|
||||
_From @trickvi on June 20, 2013 12:37_
|
||||
|
||||
I would like to be able to use data.okfn.org as an intermediary between my software and the data packages it uses and be able to quickly check whether there's a new version available of the data (e.g. if I've cached the package on a local machine).
|
||||
|
||||
There are ways to do it with the current setup:
|
||||
|
||||
1. Download the datapackage.json descriptor file, parse it and get the version there and check it against my local version. Problems:
|
||||
- This solution relies on humans and that they update their version but there might not be any consistency in it since the data package standard describes the version attribute as: _"a version string conforming to the Semantic Versioning requirement"_
|
||||
- I have to fetch the whole datapackage.json (it's not big I know but why download all that extra data I might not even want)
|
||||
2. Go around data.okfn.org and look directly at the github repository. Problems:
|
||||
- I have to find out where the repo is, use git and do a lot of extra stuff (I don't care how the data packages are stored, I just want a simple interface to fetch them)
|
||||
- What would be the point of data.okfn.org/data? In my mind it collects data packages and provides a consistent interface to get the data packages irrespective of how its stored.
|
||||
|
||||
I propose data.okfn.org provides an internal system to allow users to quickly check whether a new version might be released. This does not have to be an API. We could leverage HTTP's caching mechanism using an ETag header that would contain some hash value. This hash value can e.g. be the the sha value of heads ref objects served via the Github API:
|
||||
|
||||
```
|
||||
https://api.github.com/repos/datasets/cpi/git/refs/heads/master
|
||||
```
|
||||
|
||||
Software that works with data packages could then implement a caching strategy and just send a request with an If-None-Match header along with a GET request for datapackage.json to either get a new version of the descriptor (and look at the version in that file) or just serve the data from its cache.
|
||||
|
||||
_Copied from original issue: frictionlessdata/ideas#51_
|
||||
|
||||
|
||||
#### Revisioning - Implicit Versioning
|
||||
|
||||
…
|
||||
|
||||
#### Change Notifications
|
||||
|
||||
As a Consumer I want to be notified of changes to a package i care about so that I can check out what has changed and take action (like downloading the updated data)
|
||||
|
||||
As a Consumer I want to see how active the site is to see if I should get involved
|
||||
|
||||
### 6. Publishers
|
||||
|
||||
#### Create a New Publisher
|
||||
|
||||
TODO
|
||||
|
||||
#### Find a Publisher (and users?)
|
||||
|
||||
As a Consumer I want to browse and find publishers so that I can find interesting publishers and their packages (so that I can use them)
|
||||
|
||||
#### View a Publisher Profile
|
||||
|
||||
*view data packages associated to a publisher or user*
|
||||
|
||||
Implementation details: [*https://hackmd.io/MwNgrAZmCMAcBMBaYB2eAWR72woghmLNIrAEb4AME+08s6VQA===*](fonts/SourceSansPro-boldItalic.ttf)
|
||||
|
||||
As a Consumer I want to see a publisher’s profile so that I can discover their packages and get a sense of how active and good they are
|
||||
|
||||
**As a Publisher I want to have a profile with a list of my data packages so that:**
|
||||
|
||||
* Others can find my data packages quickly and easily
|
||||
* Can see how many data packages i have
|
||||
* **I can find a data package i want to look at quickly [they can discover their own data]**
|
||||
* **I can find the link for a data package to send to someone else**
|
||||
* *People want to share what they have done. This is probably the number one way the site gets prominence at the start (along with simple google traffic)*
|
||||
* so that I can check that members do not abuse their rights to publish and only publish topical data packages.
|
||||
|
||||
As a Consumer I want to view a publisher’s profile so that I can see who is behind a particular package or to see what other packages they produce [navigate up from a package page] [so that: i can trust on his published data packages to reuse.]
|
||||
|
||||
**Details**
|
||||
|
||||
* Profile =
|
||||
* Full name / title e.g. “World Bank”, identifier e.g. world-bank
|
||||
* *picture, short description text (if we have this - we don’t atm)*
|
||||
* *(esp important to know if this is the world bank or not)*
|
||||
* *Total number of data packages*
|
||||
* List of data packages
|
||||
* View by most recently created (updated?)
|
||||
* For each DataPackage want to see: title, number of resources (?), first 200 character of description, license (see data.okfn.org/data/ for example)
|
||||
* Do we limit / paginate this list? No, not for the moment
|
||||
* *[wontfix atm] Activity - this means data packages published, updated*
|
||||
* *[wontfix atm] Quality … - we don’t have anything on this*
|
||||
* *[wontfix atm] List of users*
|
||||
* What are the permissions here?
|
||||
* Do we show private data packages? No
|
||||
* Do we show them when “owner” viewing or sysadmin? Yes (but flag as “private”)
|
||||
* What data packages to show? All the packages you own.
|
||||
* What about pinning? No support for this atm.
|
||||
|
||||
##### Search among publishers packages
|
||||
|
||||
As a Consumer i want to search among all data packages owned by a publisher so that I can easily find one data package amongst all the data packages by this publisher.
|
||||
|
||||
##### Registered Users Profile and packages
|
||||
|
||||
*As a Consumer i want to see the profile and activity of a user so that …*
|
||||
|
||||
*As a Registered User I want to see the data packages i am associated with **so that** [like publisher]*
|
||||
|
||||
#### Publisher and User Leaderboard
|
||||
|
||||
As a ??? I want to see who are the top publihers and users so that I can emulate them or ???
|
||||
|
||||
#### Manage Publisher
|
||||
|
||||
##### Create and Edit Profile
|
||||
|
||||
As {'{Owner ...}'} I want to edit my profile so that it is updated with new information
|
||||
|
||||
##### Add and Manage Members
|
||||
|
||||
As an {'{Owner of a Publisher in the Registry}'} I want to invite an existing user to become a member of my publisher
|
||||
|
||||
* Auto lookup by user name (show username and fullname) - standard as per all sites
|
||||
* User gets a notification on their dashboard + email with link to accept invite
|
||||
* If invite is accepted notify the publisher (?) - actually do not do this.
|
||||
|
||||
As an {'{Owner of a Publisher in the Registry}'} I want to invite someone using their email to sign up and become a member of my Publisher so that they are authorized to publish data packages under my Publisher.
|
||||
|
||||
As an {'{Publisher Owner}'} I want to remove someone from membership in my publisher so they no longer have ability to publish or modify my data packages
|
||||
|
||||
As a {'{Publisher Owner}'} I want to view all the people in my organization and what roles they have so that I can change these if I want
|
||||
|
||||
As a {'{Publisher Owner}'} I want to make a user an “owner” so they have full control
|
||||
|
||||
As a {'{Publisher Owner}'} I want to remove a user as an “owner” so they are just a member and no longer have full control
|
||||
|
||||
### 7. Web Hooks and Extensions
|
||||
|
||||
TODO: how do people build value added services around the system (and push back over the API etc …) - OAuth etc
|
||||
|
||||
### 8. Administer Site
|
||||
|
||||
#### Configure Site
|
||||
|
||||
As the Admin I want to set key configuration parameters for my site deployment so that I can change key information like the site title
|
||||
|
||||
* Main config database is the one thing we might need
|
||||
|
||||
#### See usage metrics
|
||||
|
||||
As an Admin I want to see key metrics about usage such as users, API usage, downloads etc so that I know how things are going
|
||||
|
||||
* Total users are signed up, how many signed up in last week / month etc
|
||||
* Total publishers …
|
||||
* Users per publisher distribution (?)
|
||||
* API usage
|
||||
* Downloads
|
||||
* Billing: revenue in relevant periods
|
||||
* Costs: how much are we spending on storage
|
||||
|
||||
#### Pricing and Billing
|
||||
|
||||
As an Admin I want to have a pricing plan and billing system so that I can charge users and make my platform sustainable
|
||||
|
||||
As a Publisher I want to know if this site has a pricing plan and what the prices are so that I can work out what this will cost me in the future and have a sense that these guys are sustainable (‘free forever’ does not work very well)
|
||||
|
||||
As a Publisher I want to sign up for a given pricing plan so that I am entitled to what it allows (e.g. private stuff …)
|
||||
|
||||
### Private Data Packages
|
||||
|
||||
cf npmjs.com
|
||||
|
||||
As a Publisher I want to have private data packages that I can share just with my team
|
||||
|
||||
### Sell My Data through your site
|
||||
|
||||
**EPIC: As a Publisher i want to sell my data through your site so that I make money and am able to sustain my publishing and my life …**
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user