Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support registering new backends #45

Open
janjagusch opened this issue Jul 23, 2022 · 6 comments
Open

Support registering new backends #45

janjagusch opened this issue Jul 23, 2022 · 6 comments

Comments

@janjagusch
Copy link
Contributor

As a user of the minimalkv framework, I want to create and register new backends and use them through the get_store_from_url function without changing the minimalkv library. This is currently not possible, as the extract_params function hard-codes the known storage types:

https://github.com/data-engineering-collective/minimalkv/blob/main/minimalkv/_urls.py#L70-L122

What I'm imagining is a registration function that makes minimalkv aware of this new storage type, similarly to how fsspec does it.

@janjagusch
Copy link
Contributor Author

Having looked at the code more thoroughly, I think we should:

  1. Add integration tests for get_store_from_url
  2. Add a classmethod from_url(cls, url: str) -> ""KeyValueStore" to the definition of KeyValueStore. Alternatively, we could add a classmethod from_parsed_url(cls, scheme, host, port, path, query, userinfo) -> "KeyValueStore", similar to extract_params.
  3. get_store_from_url then only looks up the schema and delegates the rest of the logic to from_(parsed)_url
  4. Finally, we mirror what fsspec does and allow registering additional stores. Registration would happen through an entry point in the setup.py.

Apart from now being able to register additional stores, this should also clean up _urls.py, _get_store.py, _store_creation.py, and _store_decoration.py.

FYI: @SimonBohnenQC

@simonbohnen
Copy link
Contributor

This concept sounds much better than the current structure 👍🏼

As authentication was an issue in #51, I would suggest testing get_store_from_url against live AWS / GCS. Can we provide credentials as GitHub secrets to do this @janjagusch?

@simonbohnen
Copy link
Contributor

We would probably not be using create_store, url2dict, and get_store internally anymore. We should probably still keep them available, right? Deprecating them seems like the best choice.

@xhochy
Copy link
Member

xhochy commented Nov 4, 2022

I would love to keep create_store as this gives a way to pass parameters where there can also be more schema validation upfront. A typical use case is to have the store configuration in a JSON/YAML and then use create_store(json.load(…)).

@simonbohnen
Copy link
Contributor

I would love to keep create_store as this gives a way to pass parameters where there can also be more schema validation upfront. A typical use case is to have the store configuration in a JSON/YAML and then use create_store(json.load(…)).

I see the point of creating stores only from str or int parameters. I would also prefer if store objects could be created without relying on an internet connection or successful authentication. Thus, I'd like to move e.g. the create_store_azure code to a constructor for the AzureBlockBlobStore class.

What do you mean by "schema validation" in this context?

@xhochy
Copy link
Member

xhochy commented Nov 10, 2022

With "schema validation", I mean that you can specify a type schema for JSON/YAML files and validate that they have the correct values before creating the stores themselves. This has been useful in the past in using the stores in some settings to check whether the configuration is valid before deploying it to production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants