Skip to content
This repository has been archived by the owner on Aug 27, 2023. It is now read-only.

404 error when using GCS as the storage backend. #257

Open
josewails opened this issue Aug 21, 2020 · 3 comments
Open

404 error when using GCS as the storage backend. #257

josewails opened this issue Aug 21, 2020 · 3 comments

Comments

@josewails
Copy link

josewails commented Aug 21, 2020

I am using GCS as a storage backend. Whenever I update my package and try to install it for the first time, I get the following error.


ERROR: HTTP error 404 while getting {base_url}/eop-shared/eop_shared-0.0.7-py3-none-any.whl#sha256=7fa3fd43c9687240e885afb23215928fc4bb9f09eae44efcf5fb579053afd087 (from {base_url}/simple/eop-shared/)

ERROR: Could not install requirement eop_shared from {base_url}/api/package/eop-shared/eop_shared-0.0.7-py3-none-any.whl#sha256=7fa3fd43c9687240e885afb23215928fc4bb9f09eae44efcf5fb579053afd087 because of HTTP error 404 Client Error: Not Found for url: {base_url}/api/package/eop-shared/eop_shared-0.0.7-py3-none-any.whl for URL {base_url}/api/package/eop-shared/eop_shared-0.0.7-py3-none-any.whl#sha256=7fa3fd43c9687240e885afb23215928fc4bb9f09eae44efcf5fb579053afd087 (from {base_url}/simple/eop-shared/)

On subsequent installations, it works fine.

@stevearc
Copy link
Owner

The package download endpoint is rendered by

def download_package(context, request):

What is your pypi.fallback setting? What are you using for your pypi.db? Are you running pypicloud as a single instance or as a swarm? And is there anything notable in the server logs as this is happening?

There's only a couple places in that function that return a 404. I think the most likely case is the first fetch from the DB fails for some reason, but it's hard to say without knowing more about your setup.

Some other things you could try that might be useful for debugging:

  • After uploading your package, wait for 2-5 minutes before trying to install
  • After uploading your package, restart pypicloud before trying to install

@josewails
Copy link
Author

  1. I didn't have the fallback settings. After your reply, I changed this to the cache which seemed to fix the 404 error but then I ran into some other problem.
ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    eop_shared from {base_url}/api/package/eop-shared/eop_shared-0.0.7-py3-none-any.whl#sha256=7fe6704955639152898ae3bed1dec57d2e7c3e4432b474b0532571e570c92fea:
        Expected sha256 7fe6704955639152898ae3bed1dec57d2e7c3e4432b474b0532571e570c92fea
             Got        16b0e5e15ea69975f2e19c4185e46f2c8f3568567980af2a6b929adda3c755e3

This happens when trying to run pip install --no-cache-dir --upgrade -i {base_url}/simple/ eop_shared inside a docker container for the first time right after uploading a new version of my package.

I actually added the no-cache-dir flag hoping it will fix the issue but it doesn't.

  1. I am using SQLite as the default DB.
  2. The server logs are clean. Nothing notable.
  3. I am running it on Google App Engine flexible. It's on autoscale so it's hard to tell the number of instances. At the time of writing, they are two of them.

@stevearc
Copy link
Owner

stevearc commented Sep 8, 2020

For the hash mismatch, I'm not sure what could be causing this. The code to hash the uploaded packages lives here:

if self.calculate_hashes:
file_data = data.read()
metadata["hash_sha256"] = hashlib.sha256(file_data).hexdigest()
metadata["hash_md5"] = hashlib.md5(file_data).hexdigest()
data = BytesIO(file_data)

And it should be stored as metadata on the GCS object. Then when you fetch it, the url will be generated with a #sha256=... fragment. It seems like pip is hashing the file and coming up with a different value. It's possible that the file is getting corrupted as it's uploaded (could be related to #258 if so). If that's the case, then you should be able to see a difference in the file you uploaded compared to the one in GCS. It's also possible we're calculating the hash incorrectly. If that's the case, the files should be identical and the file you uploaded should also have the hash 16b0e5e15ea69975f2e19c4185e46f2c8f3568567980af2a6b929adda3c755e3

Another note: you won't be able to use SQLite as the caching backend if you have multiple pypicloud servers. The point of the cache is to be more performant than the storage (GCS, S3, etc) in a way that can be queried by all of the servers. It has to stay in sync with the storage backend, otherwise you'll get inconsistent results. When you upload a package with two server instances, one of them will perform the upload and update its SQLite cache, but the other one won't know that the package exists.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants