Skip to content

Commit

Permalink
Upgrade vectorize support to Astra's Public Preview available service (
Browse files Browse the repository at this point in the history
…#33)

* upgrade vectorize support

* more robust env var checks for vectorize; gh actions vectorize env var passthrough

* re-enable int. tests in the release flow (tentatively)

* snake-cased a couple of variables

---------

Co-authored-by: stefano <[email protected]>
  • Loading branch information
hemidactylus and stefano authored May 28, 2024
1 parent f41f0dc commit 77d3d81
Show file tree
Hide file tree
Showing 11 changed files with 301 additions and 202 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/_integration_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ jobs:
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
SHARED_SECRET_NAME_OPENAI: ${{ secrets.SHARED_SECRET_NAME_OPENAI }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
make integration_tests
Expand Down
16 changes: 9 additions & 7 deletions .github/workflows/_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -156,13 +156,15 @@ jobs:
run: make tests
working-directory: ${{ inputs.working-directory }}

# - name: Run integration tests
# env:
# ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
# ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
# ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
# run: make integration_tests
# working-directory: ${{ inputs.working-directory }}
- name: Run integration tests
env:
ASTRA_DB_API_ENDPOINT: ${{ secrets.ASTRA_DB_API_ENDPOINT }}
ASTRA_DB_APPLICATION_TOKEN: ${{ secrets.ASTRA_DB_APPLICATION_TOKEN }}
ASTRA_DB_KEYSPACE: ${{ secrets.ASTRA_DB_KEYSPACE }}
SHARED_SECRET_NAME_OPENAI: ${{ secrets.SHARED_SECRET_NAME_OPENAI }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: make integration_tests
working-directory: ${{ inputs.working-directory }}

- name: Get minimum versions
working-directory: ${{ inputs.working-directory }}
Expand Down
4 changes: 2 additions & 2 deletions libs/astradb/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,10 @@ format format_diff:
poetry run ruff --select I --fix $(PYTHON_FILES)

spell_check:
poetry run codespell --toml pyproject.toml
poetry run codespell --toml pyproject.toml -I codespell_ignore_words.txt

spell_fix:
poetry run codespell --toml pyproject.toml -w
poetry run codespell --toml pyproject.toml -w -I codespell_ignore_words.txt

check_imports: $(shell find langchain_astradb -name '*.py')
poetry run python ./scripts/check_imports.py $^
Expand Down
1 change: 1 addition & 0 deletions libs/astradb/codespell_ignore_words.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Haa
12 changes: 12 additions & 0 deletions libs/astradb/langchain_astradb/utils/astradb.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
API_ENDPOINT_ENV_VAR = "ASTRA_DB_API_ENDPOINT"
NAMESPACE_ENV_VAR = "ASTRA_DB_KEYSPACE"

DEFAULT_VECTORIZE_SECRET_HEADER = "x-embedding-api-key"

logger = logging.getLogger()


Expand Down Expand Up @@ -138,19 +140,29 @@ def __init__(
collection_vector_service_options: Optional[
CollectionVectorServiceOptions
] = None,
collection_embedding_api_key: Optional[str] = None,
) -> None:
super().__init__(
token, api_endpoint, astra_db_client, async_astra_db_client, namespace
)
embedding_key_header = {
k: v
for k, v in {
DEFAULT_VECTORIZE_SECRET_HEADER: collection_embedding_api_key,
}.items()
if v is not None
}
self.collection_name = collection_name
self.collection = AstraDBCollection(
collection_name=collection_name,
astra_db=self.astra_db,
additional_headers=embedding_key_header,
)

self.async_collection = AsyncAstraDBCollection(
collection_name=collection_name,
astra_db=self.async_astra_db,
additional_headers=embedding_key_header,
)

if requested_indexing_policy is not None:
Expand Down
33 changes: 26 additions & 7 deletions libs/astradb/langchain_astradb/vectorstores.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ def __init__(
collection_vector_service_options: Optional[
CollectionVectorServiceOptions
] = None,
collection_embedding_api_key: Optional[str] = None,
) -> None:
"""Wrapper around DataStax Astra DB for vector-store workloads.
Expand All @@ -181,8 +182,9 @@ def __init__(
Args:
embedding: the embeddings function or service to use.
This enables client-side embedding functions or calls to external
embedding providers. Only one of `embedding` or
`collection_vector_service_options` can be provided.
embedding providers. If `embedding` is provided, arguments
`collection_vector_service_options` and
`collection_embedding_api_key` cannot be provided.
collection_name: name of the Astra DB collection to create/use.
token: API token for Astra DB usage. If not provided, the environment
variable ASTRA_DB_APPLICATION_TOKEN is inspected.
Expand Down Expand Up @@ -220,10 +222,16 @@ def __init__(
(see docs.datastax.com/en/astra/astra-db-vector/api-reference/
data-api-commands.html#advanced-feature-indexing-clause-on-createcollection)
collection_vector_service_options: specifies the use of server-side
embeddings within Astra DB. Only one of `embedding` or
`collection_vector_service_options` can be provided.
NOTE: This feature is under current development.
embeddings within Astra DB. If passing this parameter, `embedding`
cannot be provided.
collection_embedding_api_key: for usage of server-side embeddings
within Astra DB, with this parameter one can supply an API Key
that will be passed to Astra DB with each data request.
This is useful when the service is configured for the collection,
but no corresponding secret is stored within
Astra's key management system.
This parameter cannot be provided without
specifying `collection_vector_service_options`.
Note:
For concurrency in synchronous :meth:`~add_texts`:, as a rule of thumb, on a
Expand All @@ -242,7 +250,7 @@ def __init__(
Remember you can pass concurrency settings to individual calls to
:meth:`~add_texts` and :meth:`~add_documents` as well.
"""
# Embedding and collection_vector_service_options are mutually exclusive,
# Embedding and the server-side embeddings are mutually exclusive,
# as both specify how to produce embeddings
if embedding is None and collection_vector_service_options is None:
raise ValueError(
Expand All @@ -256,13 +264,23 @@ def __init__(
can be provided."
)

if (
collection_vector_service_options is None
and collection_embedding_api_key is not None
):
raise ValueError(
"`collection_embedding_api_key` cannot be provided unless"
" `collection_vector_service_options` is also passed."
)

self.embedding_dimension: Optional[int] = None
self.embedding = embedding
self.collection_name = collection_name
self.token = token
self.api_endpoint = api_endpoint
self.namespace = namespace
self.collection_vector_service_options = collection_vector_service_options
self.collection_embedding_api_key = collection_embedding_api_key
# Concurrency settings
self.batch_size: int = batch_size or DEFAULT_BATCH_SIZE
self.bulk_insert_batch_concurrency: int = (
Expand Down Expand Up @@ -305,6 +323,7 @@ def __init__(
requested_indexing_policy=self.indexing_policy,
default_indexing_policy=DEFAULT_INDEXING_OPTIONS,
collection_vector_service_options=collection_vector_service_options,
collection_embedding_api_key=collection_embedding_api_key,
)
self.astra_db = self.astra_env.astra_db
self.async_astra_db = self.astra_env.async_astra_db
Expand Down
Loading

0 comments on commit 77d3d81

Please sign in to comment.