Pathway vectorstore and rag-pathway template #14859

janchorowski · 2023-12-18T19:03:56Z

Description: Integration with pathway.com data processing pipeline acting as an always updated vectorstore
Issue: not applicable
Dependencies: optional dependency on pathway
Twitter handle: pathway_com

The PR provides and integration with pathway to provide an easy to use always updated vector store:

import pathway as pw
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer

data_sources = []
data_sources.append(
    pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True))

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
vector_server = PathwayVectorServer(
    *data_sources,
    embedder=embeddings_model,
    splitter=text_splitter,
)
vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False)
client = PathwayVectorClient(
    host="127.0.0.1",
    port="8765",
)
query = "What is Pathway?"
docs = client.similarity_search(query)

The PathwayVectorServer builds a data processing pipeline which continusly scans documents in a given source connector (google drive, s3, ...) and builds a vector store. The PathwayVectorClient implements LangChain's VectorStore interface and connects to the server to retrieve documents.

--------- Co-authored-by: mlewandowski <[email protected]> Co-authored-by: Berke <[email protected]> Co-authored-by: Jan Chorowski <[email protected]> Co-authored-by: Adrian Kosowski <[email protected]>

vercel · 2023-12-18T19:04:01Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 29, 2024 5:41pm

hwchase17

lets not import this into langchain - langchain should remain unchanged

only langchain-community should be updated, and we should import directly from there

hwchase17 · 2023-12-19T01:44:47Z

libs/community/langchain_community/vectorstores/pathway.py

+
+from typing import Callable, List, Optional
+
+import pathway as pw


this should be a conditional import

Sorry about the omission, all should be optional now.

Change to a conditional import. --------- Co-authored-by: mlewandowski <[email protected]>

Fix documentation markdown formatting. --------- Co-authored-by: mlewandowski <[email protected]>

It was done as follows: 1. fetch fresh langchain master 2. `poetry add --optional pathway@latest --python ">=3.10"` 3. `poetry lock --no-update`

janchorowski · 2023-12-19T16:17:26Z

@hwchase17 we have fixed poetry lock and used type annotations suitable for Py3.8, can you re-trigger the CI run?

janchorowski · 2023-12-21T08:08:11Z

@efriis I tried to fix the formatting, now CI should be clean.

janchorowski · 2024-03-11T17:30:55Z

@efriis we have simplified the PR, leaving only the client and changing the instruction for a quick start using a publicly available server, then pointing to instructions on how to run it.

The template is also removed, and we have removed the pathway dependency, making it much leaner.

Please review!

janchorowski · 2024-03-12T10:07:09Z

@efriis I fixed linters

janchorowski · 2024-03-14T08:02:34Z

@efriis please trigger CI, we resolved a merge conflict

janchorowski · 2024-03-29T14:23:23Z

@efriis @baskaryan sorry to bother you, I re-merged master again and rerun linters. On my end locally make lint works, make test fails with FAILED tests/unit_tests/callbacks/test_callback_manager.py::test_callback_manager_configure_context_vars - AttributeError: 'Client' object has no attribute 'tracing_queue' which seems unrelated and hopefully won't block this more.

- **Description:** Integration with pathway.com data processing pipeline acting as an always updated vectorstore - **Issue:** not applicable - **Dependencies:** optional dependency on [`pathway`](https://pypi.org/project/pathway/) - **Twitter handle:** pathway_com The PR provides and integration with `pathway` to provide an easy to use always updated vector store: ```python import pathway as pw from langchain.embeddings.openai import OpenAIEmbeddings from langchain.text_splitter import CharacterTextSplitter from langchain.vectorstores import PathwayVectorClient, PathwayVectorServer data_sources = [] data_sources.append( pw.io.gdrive.read(object_id="17H4YpBOAKQzEJ93xmC2z170l0bP2npMy", service_user_credentials_file="credentials.json", with_metadata=True)) text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) embeddings_model = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"]) vector_server = PathwayVectorServer( *data_sources, embedder=embeddings_model, splitter=text_splitter, ) vector_server.run_server(host="127.0.0.1", port="8765", threaded=True, with_cache=False) client = PathwayVectorClient( host="127.0.0.1", port="8765", ) query = "What is Pathway?" docs = client.similarity_search(query) ``` The `PathwayVectorServer` builds a data processing pipeline which continusly scans documents in a given source connector (google drive, s3, ...) and builds a vector store. The `PathwayVectorClient` implements LangChain's `VectorStore` interface and connects to the server to retrieve documents. --------- Co-authored-by: Mateusz Lewandowski <[email protected]> Co-authored-by: mlewandowski <[email protected]> Co-authored-by: Berke <[email protected]> Co-authored-by: Adrian Kosowski <[email protected]> Co-authored-by: mlewandowski <[email protected]> Co-authored-by: berkecanrizai <[email protected]> Co-authored-by: Erick Friis <[email protected]> Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: Bagatur <[email protected]> Co-authored-by: mlewandowski <[email protected]> Co-authored-by: Szymon Dudycz <[email protected]> Co-authored-by: Szymon Dudycz <[email protected]> Co-authored-by: Bagatur <[email protected]>

Pathway vectorstore and rag-pathway template

51c0b07

--------- Co-authored-by: mlewandowski <[email protected]> Co-authored-by: Berke <[email protected]> Co-authored-by: Jan Chorowski <[email protected]> Co-authored-by: Adrian Kosowski <[email protected]>

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 18, 2023

dosubot bot added Ɑ: vector store Related to vector store module 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Dec 18, 2023

vercel bot deployed to Preview December 18, 2023 19:11 View deployment

hwchase17 reviewed Dec 19, 2023

View reviewed changes

fix imports (#6)

51944ea

Change to a conditional import. --------- Co-authored-by: mlewandowski <[email protected]>

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Dec 19, 2023

Berke/fix docs (#7)

3a6a5f0

Fix documentation markdown formatting. --------- Co-authored-by: mlewandowski <[email protected]>

janchorowski requested a review from hwchase17 December 19, 2023 12:17

vercel bot deployed to Preview December 19, 2023 12:25 View deployment

baskaryan added the template label Dec 19, 2023

baskaryan assigned efriis Dec 19, 2023

lewymati and others added 4 commits December 19, 2023 16:59

Merge branch 'langchain-ai:master' into master

17e1842

revert pyproject/poetry.lock

48d3371

optional

3097d7b

update pyproject and lockfile

f6f4f25

It was done as follows: 1. fetch fresh langchain master 2. `poetry add --optional pathway@latest --python ">=3.10"` 3. `poetry lock --no-update`

vercel bot deployed to Preview December 19, 2023 16:19 View deployment

lewymati and others added 2 commits December 19, 2023 22:58

Merge branch 'master' into master

b6168cb

update poetry.lock hash

7a6d18c

vercel bot deployed to Preview December 19, 2023 22:10 View deployment

Merge branch 'master' into master

74bff19

vercel bot deployed to Preview December 20, 2023 15:42 View deployment

merge

e0558ed

vercel bot deployed to Preview December 21, 2023 00:27 View deployment

Add newline required by Ruff.

8e765a2

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 11, 2024

vercel bot deployed to Preview March 11, 2024 15:23 View deployment

janchorowski added 2 commits March 11, 2024 18:28

Fix docstring

c374daf

Merge branch 'master' into master

200be9b

vercel bot deployed to Preview March 11, 2024 17:47 View deployment

janchorowski added 2 commits March 12, 2024 10:30

Merge remote-tracking branch 'upstream/master'

c276d26

linters

cf7ad98

vercel bot deployed to Preview March 12, 2024 10:13 View deployment

Merge branch 'master' into update_imports

8bc4cd0

szymondudycz deleted the update_imports branch March 13, 2024 10:07

vercel bot deployed to Preview March 13, 2024 10:12 View deployment

baskaryan enabled auto-merge (squash) March 28, 2024 00:22

janchorowski restored the update_imports branch March 29, 2024 14:17

janchorowski added 2 commits March 29, 2024 15:17

Merge remote-tracking branch 'upstream/master' into update_imports

77acfe1

Ruff lint

f904e54

auto-merge was automatically disabled March 29, 2024 14:20
Head branch was pushed to by a user without write access

vercel bot deployed to Preview March 29, 2024 14:28 View deployment

Merge branch 'master' into update_imports

196147f

vercel bot deployed to Preview March 29, 2024 17:41 View deployment

baskaryan approved these changes Mar 29, 2024

View reviewed changes

baskaryan merged commit b8b42cc into langchain-ai:master Mar 29, 2024
62 checks passed

dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pathway vectorstore and rag-pathway template #14859

Pathway vectorstore and rag-pathway template #14859

janchorowski commented Dec 18, 2023

vercel bot commented Dec 18, 2023 •

edited

Loading

hwchase17 left a comment

hwchase17 Dec 19, 2023

janchorowski Dec 19, 2023

janchorowski commented Dec 19, 2023

janchorowski commented Dec 21, 2023

janchorowski commented Mar 11, 2024 •

edited

Loading

janchorowski commented Mar 12, 2024

janchorowski commented Mar 14, 2024

janchorowski commented Mar 29, 2024


		from typing import Callable, List, Optional

		import pathway as pw

Pathway vectorstore and rag-pathway template #14859

Pathway vectorstore and rag-pathway template #14859

Conversation

janchorowski commented Dec 18, 2023

vercel bot commented Dec 18, 2023 • edited Loading

hwchase17 left a comment

Choose a reason for hiding this comment

hwchase17 Dec 19, 2023

Choose a reason for hiding this comment

janchorowski Dec 19, 2023

Choose a reason for hiding this comment

janchorowski commented Dec 19, 2023

janchorowski commented Dec 21, 2023

janchorowski commented Mar 11, 2024 • edited Loading

janchorowski commented Mar 12, 2024

janchorowski commented Mar 14, 2024

janchorowski commented Mar 29, 2024

vercel bot commented Dec 18, 2023 •

edited

Loading

janchorowski commented Mar 11, 2024 •

edited

Loading