Thoughts on keeping more metadata? #295

dkunitsk · 2022-02-22T06:01:51Z

Hey @stevearc.
QQ so I don't end up working on something that doesn't make sense: would you be philosophically against storing more metadata (say, classifiers) in the cache and making it available in the json endpoint?

And potentially also implementing a per-release json endpoint like Warehouse.

Motivation: It'd be helpful to have more metadata upfront before downloading.

It seems relatively straightforward given that the Package model already has a KV catch-all.

Follow-up: I only need this for s3 (I mention because of cache rebuilding). If I wanted to implement the feature above, would it need to have parity across backends?

Thanks!

The text was updated successfully, but these errors were encountered:

stevearc · 2022-02-23T01:26:20Z

Storing more metadata sounds great! The behavior would need to be the same across different backends, though.

The kwargs in Package will probably get you like 50% of the way there. You'll probably want to change the db.upload() signature to make use of kwargs instead of passing everything in positionally

pypicloud/pypicloud/cache/base.py

Lines 94 to 102 in 046126f

    
           def upload( 
        
               self, 
        
               filename: str, 
        
               data: BinaryIO, 
        
               name: Optional[str] = None, 
        
               version: Optional[str] = None, 
        
               summary: Optional[str] = None, 
        
               requires_python: Optional[str] = None, 
        
           ) -> Package:

The two places that ingest packages and metadata are

pypicloud/pypicloud/views/api.py

Line 70 in 046126f

def fetch_dist(request, url, name, version, summary, requires_python):

(when packages are downloaded and cached from upstream) and

pypicloud/pypicloud/views/simple.py

Lines 25 to 27 in 046126f

    
           def upload( 
        
               request, content, name=None, version=None, summary=None, requires_python=None 
        
           ):

which is the normal upload handler.

Poking around the backends a bit, it looks like the storage & caching options should just pick up the metadata additions to Package. Might require some tests to be sure.

lovetheguitar · 2022-06-07T15:09:45Z

Just found that this would probably speed up poetry's dependency resolution https://python-poetry.org/docs/faq#why-is-the-dependency-resolution-process-slow.

If implemented, it would be beneficial if something like requires_dist, platform, summary and other "required" attributes are added https://github.com/python-poetry/poetry/blob/ec89ac45ba4ca16ea860652540673c423d430457/src/poetry/repositories/pypi_repository.py#L192-L200.

nivintw · 2022-09-28T04:35:52Z

I was also looking into this, and specifically for poetry. From reading the poetry docs, i'm not super sure if poetry uses the json api for anything other than pypi itself. This is a bit unfortunate if true, but c'est la vie; might be worth asking on poetry repo for details / if they'd add support for non-pypi repositories (i.e. non-pypi package repositories) to use the json API.

That said, I could be wrong about how poetry works WRT this; it's possible it would work but doesn't atm because poetry thinks the metadata is "invalid" since it's missing the required fields you mentioned.

One alternative would be to add support for PEP658 https://peps.python.org/pep-0658/
Since this is an accepted PEP for some time now, it seems like going this route would benefit tools other than poetry as well, and wouldn't (I say having not yet done any initial POC code changes for this...) be a huge lift; not sure how much bandwidth I have, but i'd be willing to take a crack at it as time allows.

pypicloud provides a ton of value, and really appreciate the hard work here so happy to contribute back time-permitting.

To me, doing both seems worth-while in the long-run. I can't promise any specific timelines, but i'd be interested in checking out the above two things. I.e. volunteering instead of just requesting

stevearc · 2022-09-28T09:51:29Z

I think PEP 658 would be very reasonable to implement. It shouldn't be too difficult to add basic functionality, but adding the metadata hashes might be a bit tricky to avoid performance issues. Would happily review a PR for this.

dkunitsk · 2022-09-28T20:58:20Z

Minor note from me: Lyft has migrated away from PyPICloud so I will not be taking this on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thoughts on keeping more metadata? #295

Thoughts on keeping more metadata? #295

dkunitsk commented Feb 22, 2022

stevearc commented Feb 23, 2022

lovetheguitar commented Jun 7, 2022

nivintw commented Sep 28, 2022 •

edited

Loading

stevearc commented Sep 28, 2022

dkunitsk commented Sep 28, 2022

Thoughts on keeping more metadata? #295

Thoughts on keeping more metadata? #295

Comments

dkunitsk commented Feb 22, 2022

stevearc commented Feb 23, 2022

lovetheguitar commented Jun 7, 2022

nivintw commented Sep 28, 2022 • edited Loading

stevearc commented Sep 28, 2022

dkunitsk commented Sep 28, 2022

nivintw commented Sep 28, 2022 •

edited

Loading