2.7 KiB

Publish

Explanation of DataHub publishing flow from client and back-end perspectives.


graph TD


  cli((CLI fa:fa-user))
  auth[Auth Service]
  cli --login--> auth
  
	
	cli --store--> raw[Raw Store API<br>+ Storage]  
  
	cli --package-info--> pipeline-store
  raw --data resource--> pipeline-runner
  
  pipeline-store -.generate.-> pipeline-runner
	
  pipeline-runner --> package[Package Storage]
	package --api--> frontend[Frontend]
  frontend --> user[User fa:fa-user]
  

  
  package -.publish.->metastore[MetaStore]
  pipeline-store -.publish.-> metastore[MetaStore]
  metastore[MetaStore] --api--> frontend
  

Diagram for upload process

graph TD

  CLI --jwt--> rawstore[RawStore API]
  rawstore --signed urls--> CLI
  CLI --upload using signed url--> s3[S3 bucket]
  s3 --success message--> CLI
  CLI --metadata--> pipe[Pipe Source]

Identity Pipeline

Context: where this pipeline fits in the system

graph LR

  specstore --shared db--> assembler
  assembler --identity pipeline--> pkgstore
  pkgstore --> frontend

Detailed steps

graph LR

  load[Load from RawStore] --> encoding[Encoding Check<br>Add encoding info]
  encoding --> csvkind[CSV kind check]
  csvkind --> validate[Validate data]
  validate --> dump[Dump S3]
  dump --> pkgstore[Pkg Store fa:fa-database]
  load -.-> dump
  validate --> checkoutput[Validation<br>Reports]

Client Perspective

Publishing flow takes the following steps and processes to communicate with DataHub API:

sequenceDiagram
Upload Agent CLI->>Upload Agent CLI: Check Data Package valid
Upload Agent CLI-->>Auth(SSO): login
Auth(SSO)-->>Upload Agent CLI: JWT token
Upload Agent CLI->>RawStore API: upload using signed url
RawStore API->>Auth(SSO): Check key / token
Auth(SSO)->>RawStore API: OK / Not OK
RawStore API->>Upload Agent CLI: success message
Upload Agent CLI->>pipeline store: package info
pipeline store->>Upload Agent CLI: OK / Not OK
pipeline store->>pipeline runner: generate
RawStore API->>pipeline runner: data resource
pipeline runner->>Package Storage: generated
Package Storage->>Metadata Storage API: publish
pipeline store->>Metadata Storage API: publish
Metadata Storage API->>Upload Agent CLI: OK / Not OK

  • Upload API - see POST /source/upload in source section of API
  • Authentication API - see GET /auth/check in auth section of API.
  • Authorization API - see GET /auth/authorize in auth section of API.

See example code snippet in DataHub CLI