Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full porting to astrapy 1.3+ #42

Merged
merged 22 commits into from
Jul 29, 2024
Merged

Full porting to astrapy 1.3+ #42

merged 22 commits into from
Jul 29, 2024

Conversation

hemidactylus
Copy link
Collaborator

This is a large rewrite of the internals in order to completely migrate to using the recent, idiomatic astrapy.

Motivation

  • enabling non-Astra Data API (such as HCD. I could run all integration testing on HCD as well)
  • simplify most of the Data API newer features (such as $vectorize)
  • use an astrapy API that is extensively documented and actively maintained

Guide to the PR

All the newest astrapy things are now usable (e.g. environments, TokenProvider, EmbeddingHeaderProvider classes and so on).

Along the way, several improvements came along:

  • deprecating (but still 100% working) passing "clients" to the constructors. Preferred: secret/strings in init methods
  • issuing warning if "legacy" parameters are used (as part of the full alignment to the modern astrapy). But without breaking changes. Everything is 100% compatible with previous usage patterns
  • reorganization/modernization of the whole test suite (fixture re-use, uniform test function scheme, etc)
  • more testing added, esp. thoroughly probing the (totally rewritten) insert-very-many-with-possible-overwrite flows (sync/async)

Notes

  1. The internals have changed a lot. Hence, code that directly uses the internals of these classes (e.g. using vector_store.collection) would stop working. Such usage, anyway, is brittle in the first place and should not be encouraged.
  2. The document loader maintains the same signature, including its nb_prefetched parameter, which however is ignored - actually a warning is emitted if a prefetch is requested. This reflects the fact that prefetched finds have been found to be problematic in some corner cases on AstraPy and are at the moment disabled. Functionally, this has no effect on the document loader class.

bulk_delete_concurrency=kwargs.get("bulk_delete_concurrency"),
collection_vector_service_options=collection_vector_service_options,
)
return cls(**kwargs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will crash if we get an unknown argument

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is sort of the intention - i.e. so that faulty invocations of _from_texts and _afrom_texts behave as if the caller were calling the (actual) constructor. Including the penalties for passing unknown arguments.
Would you prefer to be more prescriptive here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current behavior is to ignore unknown parameters and that's also what the warning log says.
We can change to reject completely. Then the log would probably not be needed anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, very well spotted! I just reintroduced the "warn but proceed" behaviour for the sake of maintaining current behaviour (for as little a corner case this might be...).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added tests specific to this

@cbornet
Copy link
Collaborator

cbornet commented Jul 26, 2024

LGTM, great work !
A few nits.

@hemidactylus hemidactylus merged commit 55bcd8d into main Jul 29, 2024
13 checks passed
@hemidactylus hemidactylus deleted the SL-port-to-astrapy-1 branch July 29, 2024 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants