Skip to content

Commit

Permalink
Berke/u to u docs (#5104)
Browse files Browse the repository at this point in the history
* feat: update docs in llm examples, add script

* fix: add example doc gen

* fix: undo -r

* fix: re add -r

* fix: isort

* feat: add post body

* feat: add architecture image

* fix: rollback changes from other branch

* fix: fix thumbnail

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Jan Chorowski <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Jan Chorowski <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Jan Chorowski <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Jan Chorowski <[email protected]>

* fix: explainations in readme

* fix: replace example gif url

* fix: gif url in readme

* Update public/llm-app/examples/pipelines/unstructured_to_sql_on_the_fly/app.py

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* feat: add gif in format

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* fix: add author

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Apply suggestions from code review

Olivier's suggestions

Co-authored-by: Olivier Ruas <[email protected]>

* fix: change author to me

* fix: some additions

* fix: add table formatting

* fix: typo and edits

* Apply suggestions from code review

add Olivier's fix

Co-authored-by: Olivier Ruas <[email protected]>

* fix: add gif to content, image box

* fix: comma before example

* fix: typo and table in blog

* Update public/website3/content/2.developers/7.showcases/5.unstructured-to-structured.md

Co-authored-by: Olivier Ruas <[email protected]>

* Apply suggestions from code review

fix: remove text from py block

Co-authored-by: Olivier Ruas <[email protected]>

* fix: mention it is two parts

* new diagram

* fix: sort

* fix: typo

* fix: another typo

---------

Co-authored-by: Jan Chorowski <[email protected]>
Co-authored-by: Olivier Ruas <[email protected]>
Co-authored-by: Jan Chorowski <[email protected]>
GitOrigin-RevId: a7980599e46c27700d80bf1f326a9fae45d50f26
  • Loading branch information
4 people authored and Manul from Pathway committed Dec 18, 2023
1 parent e5738f4 commit 15a42f7
Showing 1 changed file with 19 additions and 5 deletions.
24 changes: 19 additions & 5 deletions examples/pipelines/unstructured_to_sql_on_the_fly/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,25 @@
The aim of this project is to extract and structure the data out of unstructured data (PDFs, queries)
on the fly.
The following program reads in a collection of financial PDF documents from a local directory
This example consists of two separate parts that can be used independently.
1 - Pipeline 1: Proactive data pipeline that is always live and tracking file changes,
it reads documents, structures them and writes results to PostgreSQL.
2 - Pipeline 2: Query answering pipeline that reads user queries, and answers them by
generating SQL queries that ar run on the data stored in PostgreSQL.
Specifically, Pipeline 1 reads in a collection of financial PDF documents from a local directory
(that can be synchronized with a Dropbox account), tokenizes each document using the tiktoken encoding,
then extracts, using the OpenAI API, the wanted fields.
The values are stored in a Pathway table which is then output to a postgreSQL instance.
The values are stored in a Pathway table which is then output to a PostgreSQL instance.
The program then starts a REST API endpoint serving queries about programming in Pathway.
Pipeline 2 then starts a REST API endpoint serving queries about programming in Pathway.
Each query text is converted into a SQL query using the OpenAI API.
The diagram is available at:
https://github.com/pathwaycom/llm-app/examples/pipelines/unstructure_to_sql_on_the_fly/Unstructured_to_SQL_diagram.png
Architecture diagram and description are at
https://pathway.com/developers/showcases/unstructured-to-structured
⚠️ This project requires a running postgreSQL instance.
Expand Down Expand Up @@ -304,6 +312,9 @@ def run(
postresql_table: str = os.environ.get("POSTGRESQL_TABLE", "quarterly_earnings"),
**kwargs,
):
#
# # Pipeline 1 - parsing documents into a PostgreSql table
#
postgreSQL_settings = {
"host": postresql_host,
"port": postresql_port,
Expand All @@ -323,6 +334,9 @@ def run(
pw.io.postgres.write(structured_table, postgreSQL_settings, postresql_table)
pw.io.csv.write(structured_table, "./examples/data/quarterly_earnings.csv")

#
# # Pipeline 2 - query answering using PostgreSql
#
unstructured_query(
postgreSQL_settings,
postresql_table,
Expand Down

0 comments on commit 15a42f7

Please sign in to comment.