Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: Add support for Upstash Vector #20824

Merged
merged 56 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
2a70916
Add support for Upstash Vector
ytkimirti Feb 2, 2024
8b1a907
Add integration tests
ytkimirti Feb 4, 2024
1e77f54
Add info api
ytkimirti Feb 4, 2024
82e38c5
Add name to indexing.ipynb
ytkimirti Feb 4, 2024
d319370
Merge remote-tracking branch 'upstream/master'
ytkimirti Feb 4, 2024
763c59b
Fix formatting
ytkimirti Feb 4, 2024
f63414d
Fix formatting
ytkimirti Feb 5, 2024
4b7ed74
Add example env vars
ytkimirti Feb 5, 2024
0d42775
Fix iteration
ytkimirti Feb 5, 2024
894d9a1
Add env vars to scheduled tests workflow file
ytkimirti Feb 5, 2024
2e50779
Formatting
ytkimirti Feb 5, 2024
6a8ed63
Remove skip
ytkimirti Feb 5, 2024
6518ae7
Add async implementations of functions
ytkimirti Feb 7, 2024
4d7dfef
Remove upstash keys from yaml file
ytkimirti Feb 8, 2024
4e3da15
Merge branch 'master-up'
ytkimirti Feb 8, 2024
3586def
Fixes, finalize integration tests
ytkimirti Feb 9, 2024
fb2277e
Merge branch 'master-up'
ytkimirti Feb 9, 2024
7d32e02
Remove optional from text_key in constructor
ytkimirti Feb 13, 2024
13bc7f2
Merge branch 'master-up'
ytkimirti Feb 13, 2024
7be8f62
Remove forgotten print
ytkimirti Feb 16, 2024
9d94e98
Add upstash docs notebook
ytkimirti Feb 16, 2024
1226916
Add async version of add_texts
ytkimirti Feb 18, 2024
1d5c172
Small cleanup
ytkimirti Feb 18, 2024
16a73e0
Update docs
ytkimirti Feb 18, 2024
699d24d
Merge branch 'master-up'
ytkimirti Feb 18, 2024
ea654ad
Cleanup
ytkimirti Feb 18, 2024
81ea626
Better description
ytkimirti Feb 18, 2024
9ab3ce3
Fix naming
ytkimirti Feb 18, 2024
2fd9651
Add support for with relevance scores functions
ytkimirti Feb 18, 2024
4dc962a
Add missing ids
ytkimirti Feb 18, 2024
073f754
Improve tests
ytkimirti Feb 18, 2024
1eb0a95
Add integration tests for async methods
ytkimirti Feb 18, 2024
19b441e
Improve formatting for the notebook
ytkimirti Feb 18, 2024
f20aade
Remove output in the notebook
ytkimirti Feb 18, 2024
7a00146
Fix formatting
ytkimirti Feb 18, 2024
8f9a06e
Fix formatting
ytkimirti Feb 18, 2024
f72a3e6
Fix formatting
ytkimirti Feb 20, 2024
660419e
Merge branch 'master-up'
ytkimirti Feb 20, 2024
ae34bd2
Merge branch 'ytkimirti/langchain' into 'CahidArda/langchain'
CahidArda Apr 22, 2024
2e64b3e
add metadata filtering to UpstashVectorStore
CahidArda Apr 22, 2024
bed4244
add native embedding to UpstashVectorStore
CahidArda Apr 22, 2024
6f9e3b0
update Upstash docs with filtering and embedding
CahidArda Apr 24, 2024
2a49b5d
fix formatting
CahidArda Apr 24, 2024
698aa9b
Merge branch 'master' into master
baskaryan Apr 24, 2024
9528006
Merge branch 'master' into CahidArda/master
baskaryan Apr 24, 2024
bef06e6
fmt
baskaryan Apr 24, 2024
e3d3c80
Merge branch 'master' into master
baskaryan Apr 24, 2024
bdbac44
Merge branch 'master' into master
CahidArda Apr 25, 2024
42ecf3d
Merge branch 'master' into master
baskaryan Apr 25, 2024
af9d678
add upstash vcr recordings
CahidArda Apr 27, 2024
94eaa9a
formatting
CahidArda Apr 27, 2024
b819fab
rm nest_asyncio.apply call
CahidArda Apr 27, 2024
40d035c
move upstash_vector import into functions
CahidArda Apr 27, 2024
c7ea271
Merge branch 'master' into master
CahidArda Apr 27, 2024
ef6926e
Merge branch 'master' into master
CahidArda Apr 29, 2024
bfc6689
Merge branch 'master' into master
baskaryan Apr 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 162 additions & 3 deletions docs/docs/integrations/providers/upstash.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,166 @@
# Upstash Redis
Upstash offers developers serverless databases and messaging
platforms to build powerful applications without having to worry
about the operational complexity of running databases at scale.

One significant advantage of Upstash is that their databases support HTTP and all of their SDKs use HTTP.
This means that you can run this in serverless platforms, edge or any platform that does not support TCP connections.

Currently, there are two Upstash integrations available for LangChain:
Upstash Vector as a vector embedding database and Upstash Redis as a cache and memory store.

# Upstash Vector

Upstash Vector is a serverless vector database that can be used to store and query vectors.

## Installation

Create a new serverless vector database at the [Upstash Console](https://console.upstash.com/vector).
Select your preferred distance metric and dimension count according to your model.


Install the Upstash Vector Python SDK with `pip install upstash-vector`.
The Upstash Vector integration in langchain is a wrapper for the Upstash Vector Python SDK. That's why the `upstash-vector` package is required.

## Integrations

Create a `UpstashVectorStore` object using credentials from the Upstash Console.
You also need to pass in an `Embeddings` object which can turn text into vector embeddings.

```python
from langchain_community.vectorstores.upstash import UpstashVectorStore
import os

os.environ["UPSTASH_VECTOR_REST_URL"] = "<UPSTASH_VECTOR_REST_URL>"
os.environ["UPSTASH_VECTOR_REST_TOKEN"] = "<UPSTASH_VECTOR_REST_TOKEN>"

store = UpstashVectorStore(
embedding=embeddings
)
```

An alternative way of `UpstashVectorStore` is to pass `embedding=True`. This is a unique
feature of the `UpstashVectorStore` thanks to the ability of the Upstash Vector indexes
to have an associated embedding model. In this configuration, documents we want to insert or
queries we want to search for are simply sent to Upstash Vector as text. In the background,
Upstash Vector embeds these text and executes the request with these embeddings. To use this
feature, [create an Upstash Vector index by selecting a model](https://upstash.com/docs/vector/features/embeddingmodels#using-a-model)
and simply pass `embedding=True`:

```python
from langchain_community.vectorstores.upstash import UpstashVectorStore
import os

os.environ["UPSTASH_VECTOR_REST_URL"] = "<UPSTASH_VECTOR_REST_URL>"
os.environ["UPSTASH_VECTOR_REST_TOKEN"] = "<UPSTASH_VECTOR_REST_TOKEN>"

store = UpstashVectorStore(
embedding=True
)
```

See [Upstash Vector documentation](https://upstash.com/docs/vector/features/embeddingmodels)
for more detail on embedding models.

### Inserting Vectors

```python
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings

loader = TextLoader("../../modules/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

# Create a new embeddings object
embeddings = OpenAIEmbeddings()

# Create a new UpstashVectorStore object
store = UpstashVectorStore(
embedding=embeddings
)

# Insert the document embeddings into the store
store.add_documents(docs)
```

When inserting documents, first they are embedded using the `Embeddings` object.

Most embedding models can embed multiple documents at once, so the documents are batched and embedded in parallel.
The size of the batch can be controlled using the `embedding_chunk_size` parameter.

Upstash offers developers serverless databases and messaging platforms to build powerful applications without having to worry about the operational complexity of running databases at scale.
The embedded vectors are then stored in the Upstash Vector database. When they are sent, multiple vectors are batched together to reduce the number of HTTP requests.
The size of the batch can be controlled using the `batch_size` parameter. Upstash Vector has a limit of 1000 vectors per batch in the free tier.

```python
store.add_documents(
documents,
batch_size=100,
embedding_chunk_size=200
)
```

### Querying Vectors

Vectors can be queried using a text query or another vector.

The returned value is a list of Document objects.

```python
result = store.similarity_search(
"The United States of America",
k=5
)
```

Or using a vector:

```python
vector = embeddings.embed_query("Hello world")

result = store.similarity_search_by_vector(
vector,
k=5
)
```

When searching, you can also utilize the `filter` parameter which will allow you to filter by metadata:

```python
result = store.similarity_search(
"The United States of America",
k=5,
filter="type = 'country'"
)
```

See [Upstash Vector documentation](https://upstash.com/docs/vector/features/filtering)
for more details on metadata filtering.

### Deleting Vectors

Vectors can be deleted by their IDs.

```python
store.delete(["id1", "id2"])
```

### Getting information about the store

You can get information about your database like the distance metric dimension using the info function.

When an insert happens, the database an indexing takes place. While this is happening new vectors can not be queried. `pendingVectorCount` represents the number of vector that are currently being indexed.

```python
info = store.info()
print(info)

# Output:
# {'vectorCount': 44, 'pendingVectorCount': 0, 'indexSize': 2642412, 'dimension': 1536, 'similarityFunction': 'COSINE'}
```

# Upstash Redis

This page covers how to use [Upstash Redis](https://upstash.com/redis) with LangChain.

Expand All @@ -12,7 +172,6 @@ This page covers how to use [Upstash Redis](https://upstash.com/redis) with Lang
## Integrations
All of Upstash-LangChain integrations are based on `upstash-redis` Python SDK being utilized as wrappers for LangChain.
This SDK utilizes Upstash Redis DB by giving UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN parameters from the console.
One significant advantage of this is that, this SDK uses a REST API. This means, you can run this in serverless platforms, edge or any platform that does not support TCP connections.


### Cache
Expand Down
Loading
Loading