Skip to content
This repository has been archived by the owner on Aug 27, 2023. It is now read-only.

Thoughts on keeping more metadata? #295

Open
dkunitsk opened this issue Feb 22, 2022 · 5 comments
Open

Thoughts on keeping more metadata? #295

dkunitsk opened this issue Feb 22, 2022 · 5 comments

Comments

@dkunitsk
Copy link

Hey @stevearc.
QQ so I don't end up working on something that doesn't make sense: would you be philosophically against storing more metadata (say, classifiers) in the cache and making it available in the json endpoint?

And potentially also implementing a per-release json endpoint like Warehouse.

Motivation: It'd be helpful to have more metadata upfront before downloading.

It seems relatively straightforward given that the Package model already has a KV catch-all.

Follow-up: I only need this for s3 (I mention because of cache rebuilding). If I wanted to implement the feature above, would it need to have parity across backends?

Thanks!

@stevearc
Copy link
Owner

Storing more metadata sounds great! The behavior would need to be the same across different backends, though.

The kwargs in Package will probably get you like 50% of the way there. You'll probably want to change the db.upload() signature to make use of kwargs instead of passing everything in positionally

def upload(
self,
filename: str,
data: BinaryIO,
name: Optional[str] = None,
version: Optional[str] = None,
summary: Optional[str] = None,
requires_python: Optional[str] = None,
) -> Package:

The two places that ingest packages and metadata are

def fetch_dist(request, url, name, version, summary, requires_python):
(when packages are downloaded and cached from upstream) and
def upload(
request, content, name=None, version=None, summary=None, requires_python=None
):
which is the normal upload handler.

Poking around the backends a bit, it looks like the storage & caching options should just pick up the metadata additions to Package. Might require some tests to be sure.

@lovetheguitar
Copy link

Just found that this would probably speed up poetry's dependency resolution https://python-poetry.org/docs/faq#why-is-the-dependency-resolution-process-slow.

If implemented, it would be beneficial if something like requires_dist, platform, summary and other "required" attributes are added https://github.com/python-poetry/poetry/blob/ec89ac45ba4ca16ea860652540673c423d430457/src/poetry/repositories/pypi_repository.py#L192-L200.

@nivintw
Copy link

nivintw commented Sep 28, 2022

I was also looking into this, and specifically for poetry. From reading the poetry docs, i'm not super sure if poetry uses the json api for anything other than pypi itself. This is a bit unfortunate if true, but c'est la vie; might be worth asking on poetry repo for details / if they'd add support for non-pypi repositories (i.e. non-pypi package repositories) to use the json API.

That said, I could be wrong about how poetry works WRT this; it's possible it would work but doesn't atm because poetry thinks the metadata is "invalid" since it's missing the required fields you mentioned.

One alternative would be to add support for PEP658 https://peps.python.org/pep-0658/
Since this is an accepted PEP for some time now, it seems like going this route would benefit tools other than poetry as well, and wouldn't (I say having not yet done any initial POC code changes for this...) be a huge lift; not sure how much bandwidth I have, but i'd be willing to take a crack at it as time allows.

pypicloud provides a ton of value, and really appreciate the hard work here so happy to contribute back time-permitting.

To me, doing both seems worth-while in the long-run. I can't promise any specific timelines, but i'd be interested in checking out the above two things. I.e. volunteering instead of just requesting

@stevearc
Copy link
Owner

I think PEP 658 would be very reasonable to implement. It shouldn't be too difficult to add basic functionality, but adding the metadata hashes might be a bit tricky to avoid performance issues. Would happily review a PR for this.

@dkunitsk
Copy link
Author

Minor note from me: Lyft has migrated away from PyPICloud so I will not be taking this on.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants