Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade vectorize support to Astra's Public Preview available service #33

Merged
merged 4 commits into from
May 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/_integration_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ jobs:
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
SHARED_SECRET_NAME_OPENAI: ${{ secrets.SHARED_SECRET_NAME_OPENAI }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
make integration_tests

Expand Down
16 changes: 9 additions & 7 deletions .github/workflows/_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -156,13 +156,15 @@ jobs:
run: make tests
working-directory: ${{ inputs.working-directory }}

# - name: Run integration tests
# env:
# ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
# ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
# ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
# run: make integration_tests
# working-directory: ${{ inputs.working-directory }}
- name: Run integration tests
env:
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
SHARED_SECRET_NAME_OPENAI: ${{ secrets.SHARED_SECRET_NAME_OPENAI }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: make integration_tests
working-directory: ${{ inputs.working-directory }}

- name: Get minimum versions
working-directory: ${{ inputs.working-directory }}
Expand Down
4 changes: 2 additions & 2 deletions libs/astradb/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ format format_diff:
poetry run ruff --select I --fix $(PYTHON_FILES)

spell_check:
poetry run codespell --toml pyproject.toml
poetry run codespell --toml pyproject.toml -I codespell_ignore_words.txt

spell_fix:
poetry run codespell --toml pyproject.toml -w
poetry run codespell --toml pyproject.toml -w -I codespell_ignore_words.txt

check_imports: $(shell find langchain_astradb -name '*.py')
poetry run python ./scripts/check_imports.py $^
Expand Down
1 change: 1 addition & 0 deletions libs/astradb/codespell_ignore_words.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Haa
jordanrfrazier marked this conversation as resolved.
Show resolved Hide resolved
12 changes: 12 additions & 0 deletions libs/astradb/langchain_astradb/utils/astradb.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
API_ENDPOINT_ENV_VAR = "ASTRA_DB_API_ENDPOINT"
NAMESPACE_ENV_VAR = "ASTRA_DB_KEYSPACE"

DEFAULT_VECTORIZE_SECRET_HEADER = "x-embedding-api-key"

logger = logging.getLogger()


Expand Down Expand Up @@ -138,19 +140,29 @@ def __init__(
collection_vector_service_options: Optional[
CollectionVectorServiceOptions
] = None,
collection_embedding_api_key: Optional[str] = None,
jordanrfrazier marked this conversation as resolved.
Show resolved Hide resolved
) -> None:
super().__init__(
token, api_endpoint, astra_db_client, async_astra_db_client, namespace
)
embedding_key_header = {
k: v
for k, v in {
DEFAULT_VECTORIZE_SECRET_HEADER: collection_embedding_api_key,
}.items()
if v is not None
}
self.collection_name = collection_name
self.collection = AstraDBCollection(
collection_name=collection_name,
astra_db=self.astra_db,
additional_headers=embedding_key_header,
)

self.async_collection = AsyncAstraDBCollection(
collection_name=collection_name,
astra_db=self.async_astra_db,
additional_headers=embedding_key_header,
)

if requested_indexing_policy is not None:
Expand Down
33 changes: 26 additions & 7 deletions libs/astradb/langchain_astradb/vectorstores.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ def __init__(
collection_vector_service_options: Optional[
CollectionVectorServiceOptions
] = None,
collection_embedding_api_key: Optional[str] = None,
) -> None:
"""Wrapper around DataStax Astra DB for vector-store workloads.

Expand All @@ -181,8 +182,9 @@ def __init__(
Args:
embedding: the embeddings function or service to use.
This enables client-side embedding functions or calls to external
embedding providers. Only one of `embedding` or
`collection_vector_service_options` can be provided.
embedding providers. If `embedding` is provided, arguments
`collection_vector_service_options` and
`collection_embedding_api_key` cannot be provided.
collection_name: name of the Astra DB collection to create/use.
token: API token for Astra DB usage. If not provided, the environment
variable ASTRA_DB_APPLICATION_TOKEN is inspected.
Expand Down Expand Up @@ -220,10 +222,16 @@ def __init__(
(see docs.datastax.com/en/astra/astra-db-vector/api-reference/
data-api-commands.html#advanced-feature-indexing-clause-on-createcollection)
collection_vector_service_options: specifies the use of server-side
embeddings within Astra DB. Only one of `embedding` or
`collection_vector_service_options` can be provided.
NOTE: This feature is under current development.

embeddings within Astra DB. If passing this parameter, `embedding`
cannot be provided.
collection_embedding_api_key: for usage of server-side embeddings
within Astra DB, with this parameter one can supply an API Key
that will be passed to Astra DB with each data request.
This is useful when the service is configured for the collection,
but no corresponding secret is stored within
Astra's key management system.
This parameter cannot be provided without
specifying `collection_vector_service_options`.

Note:
For concurrency in synchronous :meth:`~add_texts`:, as a rule of thumb, on a
Expand All @@ -242,7 +250,7 @@ def __init__(
Remember you can pass concurrency settings to individual calls to
:meth:`~add_texts` and :meth:`~add_documents` as well.
"""
# Embedding and collection_vector_service_options are mutually exclusive,
# Embedding and the server-side embeddings are mutually exclusive,
# as both specify how to produce embeddings
if embedding is None and collection_vector_service_options is None:
raise ValueError(
Expand All @@ -256,13 +264,23 @@ def __init__(
can be provided."
)

if (
collection_vector_service_options is None
and collection_embedding_api_key is not None
):
raise ValueError(
"`collection_embedding_api_key` cannot be provided unless"
" `collection_vector_service_options` is also passed."
)

self.embedding_dimension: Optional[int] = None
self.embedding = embedding
self.collection_name = collection_name
self.token = token
self.api_endpoint = api_endpoint
self.namespace = namespace
self.collection_vector_service_options = collection_vector_service_options
self.collection_embedding_api_key = collection_embedding_api_key
# Concurrency settings
self.batch_size: int = batch_size or DEFAULT_BATCH_SIZE
self.bulk_insert_batch_concurrency: int = (
Expand Down Expand Up @@ -305,6 +323,7 @@ def __init__(
requested_indexing_policy=self.indexing_policy,
default_indexing_policy=DEFAULT_INDEXING_OPTIONS,
collection_vector_service_options=collection_vector_service_options,
collection_embedding_api_key=collection_embedding_api_key,
)
self.astra_db = self.astra_env.astra_db
self.async_astra_db = self.astra_env.async_astra_db
Expand Down
Loading
Loading