datahub/DESIGN.md
Luccas Mateus de Medeiros Gomes b2438e655f [monorepo][m] - restructuring
- renamed apps to examples
- renamed libs to packages
- fixed data-literate build
2023-04-11 13:23:53 -03:00

84 lines
2.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Design Notes
## Roadmap
General comment: let's do "README" (docs) driven development here.
* [x] [show] Local functionality for Frictionless datasets with CSV #528
* [x] Move in new work (portal-experiment) into portal.js and refactor https://github.com/datopian/portal.js.bak/issues/59
* [ ] [show] Uber Epic covering all functionality **See below**
* [ ] [show] README only + data datasets (dont have to be frictionless)
* (?) Graphs direct in README with say visdown …
* [ ] [show] SQL interface to the data (alasql or sql.js … https://github.com/agershun/alasql/wiki/Performance-Tests)
* [ ] file/resource subpages ... (for datasets with lots of resources)
* [ ] Docs **80% analysed** #
* [ ] Create portal components and library i.e. have a Table, Graph, Dataset component
* [ ] publish to @datopian/portal
* [ ] Examples
* [ ] Catalog functionality **20% analysed**
## [uber][epic] Show functionality for single datasets
### Features
* Elegant
* Description (README/Description)
* Data preview and exploration (for tablular)
* Basic: some sample data shown
* Data exploration v1: filterable
* Data Exploration v2: can do sql etc ...
* Graphs / visualization
* Validation: this row does not match schema in column X
* Summarization e.g. this columns has this range of values, this average value, this number of nulls
### Dataset structure support (in rough order of priority / like implementation)
* Frictionless
* Plain README (with frontmatter)
* README (no frontmatter) and LICENSE file (?)
Data has roughly two dimensions that are relevant
* Format
* CSV
* xlsx
* JSON
* ...
* Size
* Small: < 5mb (can just load inline ...)
* Medium < 100mb
* Large < 5Gb
* xlarge > 5Gb
* TODO: How does show/build work with remote files e.g. a resource ...
```
path: abc.csv
remote_storage_url: s3://.../.../.../
```
Options:
* We clone the data into path locally ...
* Possible problem if data is big ...
* Load data direct from remote_storage_url (as long as supports CORs)
## Architecture
Portal.js is a React and NextJS based framework for building dataset/resources pages and catalogs. It consists of:
* React components for data portal functionality e.g. data tables, graphs, dataset pages etc
* Tooling to load data (based on Frictionless)
* Template sites you can reuse using `create-next-app`
* Single dataset micro-site
* Github backed catalog
* CKAN backed catalog
* ...
* Local development environment
* Deployment integration with DataHub.io
In summary, technically PortalJS is: NextJS + data specific react components + data loading glue (mostly using frictionless-js).