Skip to content

Latest commit

 

History

History
85 lines (48 loc) · 3.98 KB

sqlflow_architecture.md

File metadata and controls

85 lines (48 loc) · 3.98 KB

SQLFlow

SQLFlow is a tool that automates data lineage discovery by analyzing the SQL script. It generates a nice clean diagram to show the dataflow among the table/view and columns in the data warehouse.

Support more than 20 major databases, including bigquery, couchbase, dax, db2, greenplum, hana, hive, impala, informix, mdx, mysql, netezza, openedge, oracle, postgresql, redshift, snowflake, sqlserver, sybase, teradata, vertica,

Just paste the SQL script and click a button, you will get the data lineage diagram instantly, highlight the dataflow in the graph with a simple mouse click.

You can also call the RESTful API provided by this tool in your program and get the data lineage and diagram model information in a JSON snippet to make further usage.

SQLFlow architecture

SQLFlow components

SQLFlow components

SQLFlow

SQLFlow frontend

  1. Send the SQL script received from the browser to the backend.

  2. After receiving the result, which includes the data lineage and diagram model generated by the backend, visualize the diagram model in the browser.

  3. Highlight the dataflow in the diagram when the user clicks on a specific table/column.

SQLFlow backend

  1. SQLFlow-Analyzer: receiving the SQL script from the frontend and parse the SQL script into parse tree nodes by utilizing the GSP library, calculate the data lineage by analyzing AST.

    The SQLFlow-Analyzer component can be executed as a standalone tool. Take SQL text as input and generate data lineage in JSON format. Check SQLFlow.java for more.

  2. FlowLayout: Calculating the layout of database objects(table/column) in the dlineage and generate the diagram model with all necessary position data, including nodes and edges. FlowLayout depends on doLayout library to layout the database objects.

  3. Return a JSON snippet including the data lineage and diagram model to the frontend.

Use SQLFlow in your flavor ways

  1. Visit SQLFlow Cloud using the browser

    You may paste your SQL script into the SQLFlow web page or upload the SQL file to the site. Select the correct database and then click the visualize button.

  2. Use RESTFul APIs

    SQLFlow provides RESTful API, so your program can communicate with the SQLFlow backend directly. Sending the SQL to SQLFlow backend and receive a JSON snippet including the data lineage and diagram model for further processing in your program.

    Please note that you need to set up the SQLFlow on-premise version on your server to use the API.

    Or connect to the SQLFlow Cloud to use the RESTFul API

  3. Install both frontend and backend on your own application/server (SQLFlow on-premise version)

    Setup both the frontend and backend of SQLFlow on our server. Please check the setup manual.

    To setup SQLFlow on your server, please contact us to obtain a commercial license to get all those distribution files of the SQLFlow on-premise version.

  4. Integrate the frontend and backend to your data platform

    The front end is written in Typescript to integrate it into your frontend if you use the same tech.

    The backend part can be provided as Java library so you can be embedded into your program.

Relations generated by SQLFlow

This article describes the relationship generated by the SQLFlow between column and column, column and table/view. One relation includes one target column and a relationship type, and one or more source columns.