84 lines
2.6 KiB
Markdown
84 lines
2.6 KiB
Markdown
# Design Notes
|
||
|
||
## Roadmap
|
||
|
||
General comment: let's do "README" (docs) driven development here.
|
||
|
||
* [x] [show] Local functionality for Frictionless datasets with CSV #528
|
||
* [x] Move in new work (portal-experiment) into portal.js and refactor https://github.com/datopian/portal.js.bak/issues/59
|
||
* [ ] [show] Uber Epic covering all functionality **See below**
|
||
* [ ] [show] README only + data datasets (don’t have to be frictionless)
|
||
* (?) Graphs direct in README with say visdown …
|
||
* [ ] [show] SQL interface to the data (alasql or sql.js … https://github.com/agershun/alasql/wiki/Performance-Tests)
|
||
* [ ] file/resource subpages ... (for datasets with lots of resources)
|
||
* [ ] Docs **80% analysed** #
|
||
* [ ] Create portal components and library i.e. have a Table, Graph, Dataset component
|
||
* [ ] publish to @datopian/portal
|
||
* [ ] Examples
|
||
* [ ] Catalog functionality **20% analysed**
|
||
|
||
## [uber][epic] Show functionality for single datasets
|
||
|
||
### Features
|
||
|
||
* Elegant
|
||
* Description (README/Description)
|
||
* Data preview and exploration (for tablular)
|
||
* Basic: some sample data shown
|
||
* Data exploration v1: filterable
|
||
* Data Exploration v2: can do sql etc ...
|
||
* Graphs / visualization
|
||
* Validation: this row does not match schema in column X
|
||
* Summarization e.g. this columns has this range of values, this average value, this number of nulls
|
||
|
||
### Dataset structure support (in rough order of priority / like implementation)
|
||
|
||
* Frictionless
|
||
* Plain README (with frontmatter)
|
||
* README (no frontmatter) and LICENSE file (?)
|
||
|
||
Data has roughly two dimensions that are relevant
|
||
|
||
* Format
|
||
* CSV
|
||
* xlsx
|
||
* JSON
|
||
* ...
|
||
* Size
|
||
* Small: < 5mb (can just load inline ...)
|
||
* Medium < 100mb
|
||
* Large < 5Gb
|
||
* xlarge > 5Gb
|
||
|
||
* TODO: How does show/build work with remote files e.g. a resource ...
|
||
|
||
```
|
||
path: abc.csv
|
||
remote_storage_url: s3://.../.../.../
|
||
```
|
||
|
||
Options:
|
||
|
||
* We clone the data into path locally ...
|
||
* Possible problem if data is big ...
|
||
* Load data direct from remote_storage_url (as long as supports CORs)
|
||
|
||
|
||
|
||
|
||
## Architecture
|
||
|
||
Portal.js is a React and NextJS based framework for building dataset/resources pages and catalogs. It consists of:
|
||
|
||
* React components for data portal functionality e.g. data tables, graphs, dataset pages etc
|
||
* Tooling to load data (based on Frictionless)
|
||
* Template sites you can reuse using `create-next-app`
|
||
* Single dataset micro-site
|
||
* Github backed catalog
|
||
* CKAN backed catalog
|
||
* ...
|
||
* Local development environment
|
||
* Deployment integration with DataHub.io
|
||
|
||
In summary, technically PortalJS is: NextJS + data specific react components + data loading glue (mostly using frictionless-js).
|