Failing to push large image layers #18800

viktor-f · 2023-06-08T09:40:33Z

viktor-f
Jun 8, 2023

Not sure if this should have been an issue instead, let me know if you want me to move it.

I have some issues pushing images with large layers to harbor. Using the docker cli I can see the large layers uploading until the "meter" gets full, then it stops for a while until it fails and then retries the upload. Looking at the different logs in Harbor I can see two relevant log lines in core and in registry.

Log for core: 2023/06/08 09:01:25 http: proxy error: context canceled
Log for registry:

time="2023-06-08T09:01:26.497450171Z" level=error msg="client disconnected during blob PATCH" auth.user.name="harbor_registry_user" contentLength=583054592 copied=276194948 error="unexpected EOF" go.version=go1.18.5 http.request.host=harbor.<redacted> http.request.id=4499b24d-cac3-46cb-b34c-416c63b381b2 http.request.method=PATCH http.request.remoteaddr=185.189.28.150 http.request.uri="/v2/viktor-test/cocalc/blobs/uploads/658c9b1f-19fd-4de7-81f1-b5a54f377f7b?_state=WsmyqlXhWkdR_WM9a-l_xZSRC7sfgVYpVDklXcSrZGJ7Ik5hbWUiOiJ2aWt0b3ItdGVzdC9jb2NhbGMiLCJVVUlEIjoiNjU4YzliMWYtMTlmZC00ZGU3LTgxZjEtYjVhNTRmMzc3ZjdiIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTA2LTA4VDA4OjU3OjE0Ljc2MTQ3NjA0OVoifQ%3D%3D" http.request.useragent="docker/20.10.21 go/go1.18.1 git-commit/20.10.21-0ubuntu1~22.04.3 kernel/5.19.0-43-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.21 \(linux\))" vars.name="viktor-test/cocalc" vars.uuid=658c9b1f-19fd-4de7-81f1-b5a54f377f7b 
2023/06/08 09:01:27 http: superfluous response.WriteHeader call from github.com/docker/distribution/vendor/github.com/gorilla/handlers.(*responseLogger).WriteHeader (handlers.go:125)
time="2023-06-08T09:01:27.677228976Z" level=error msg="response completed with error" auth.user.name="harbor_registry_user" err.code=unknown err.detail="client disconnected" err.message="unknown error" go.version=go1.18.5 http.request.host=harbor.<redacted> http.request.id=4499b24d-cac3-46cb-b34c-416c63b381b2 http.request.method=PATCH http.request.remoteaddr=185.189.28.150 http.request.uri="/v2/viktor-test/cocalc/blobs/uploads/658c9b1f-19fd-4de7-81f1-b5a54f377f7b?_state=WsmyqlXhWkdR_WM9a-l_xZSRC7sfgVYpVDklXcSrZGJ7Ik5hbWUiOiJ2aWt0b3ItdGVzdC9jb2NhbGMiLCJVVUlEIjoiNjU4YzliMWYtMTlmZC00ZGU3LTgxZjEtYjVhNTRmMzc3ZjdiIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDIzLTA2LTA4VDA4OjU3OjE0Ljc2MTQ3NjA0OVoifQ%3D%3D" http.request.useragent="docker/20.10.21 go/go1.18.1 git-commit/20.10.21-0ubuntu1~22.04.3 kernel/5.19.0-43-generic os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.21 \(linux\))" http.response.contenttype="application/json; charset=utf-8" http.response.duration=52.789488779s http.response.status=500 http.response.written=89 vars.name="viktor-test/cocalc" vars.uuid=658c9b1f-19fd-4de7-81f1-b5a54f377f7b

As you can see especially the logs from core are not very specific. Adding debug logging did not really reveal anything more relevant to this. I tried going through the network traffic and it seems like the error starts with core returning a 502 http error to the client, then you get some reset messages going to core from the client (or possibly from nginx in between).

I'm running harbor in several kubernetes clusters, installed via the helm chart running version 2.6. I can see this problem in several clusters, but it seems to depend a bit on the cloud provider. For one cloud provider I can see errors starting with layers that are ~1 GB, while on another cloud provider I can see errors for layers that are ~7GB. The installations are also not always consistent in what size fails, e.g. sometimes the first cloud fails at 800MB and sometimes it can manage 1.5GB. In one cloud I'm storing images in object storage via s3 (though it's not aws) and in the other it's object storage via swift api.

It's very possible that there is some issue with the object storage on these clouds. But I would like to know exactly what happens here, is there some timeout in harbor, is there some actual error talking to the object storage, or something else? Ultimately I want to be able to support larger layers (at least up to ~2GB), but if that is not possible then it would be good to just understand what is happening.

Answered by wy65701436

Jun 19, 2023

hi, what's the harbor version? Based on the failure error, it appears that the request was canceled due to a timeout, which could be caused by either the proxy or Harbor components.

To troubleshoot this issue, you can start by tuning the request/response timeout by setting a larger value and retrying the operation.

View full answer

wy65701436 · 2023-06-19T03:57:31Z

wy65701436
Jun 19, 2023
Maintainer

hi, what's the harbor version? Based on the failure error, it appears that the request was canceled due to a timeout, which could be caused by either the proxy or Harbor components.

To troubleshoot this issue, you can start by tuning the request/response timeout by setting a larger value and retrying the operation.

1 reply

viktor-f Jun 19, 2023
Author

Hi, I'm currently running 2.6.0.
Could you point me to the timeout config you are talking about?

yunghollow91 · 2024-05-02T12:05:46Z

yunghollow91
May 2, 2024

Hey @viktor-f , I'm encountering the same problem as you do, did you manage to have a workaround for this issue?

0 replies

yunghollow91 · 2024-05-14T13:27:14Z

yunghollow91
May 14, 2024

For anyone also encountering this issue, we've discovered that the problem lies on ingress-nginx just randomly closing long HTTP connections after some time. We mitigated this by switching to a different ingress.

0 replies

crazyelectron-io · 2024-07-08T06:08:47Z

crazyelectron-io
Jul 8, 2024

@yunghollow91 doe you have an example/snippet of where you changed that? I cannot find any timeout related parameter in the Helm values.yaml (running Harbor on Kubernetes). Or is this your own ingress controller outside of the Harbor installed ngnix (I use Traefik there)?

3 replies

yunghollow91 Jul 8, 2024

@crazyelectron-io I switched from the external ingress-nginx controller to an external Envoy proxy in Kubernetes, and set very long timeouts on the new proxy for large image uploads, which resolved the issue.
For us the problem could be pinpointed to nginx by simulating a Image push via a simple webservice that sends an arbitrarily long PUT request to the ingress, which failed consistently, however sending the request via the Kubernetes Service (tested via local port-forward) was fine. This also implies that the image size was not relevant for the error but that this will always happen with long pushes.
I'm not experencied with Traefik, but you could try the same experiment as we did and see if the problem could be pinpointed to the reverse-proxy

nelsonrogers Nov 6, 2024

Did you find the solution with traefik?

crazyelectron-io Nov 12, 2024

Not yet. Need to find some time to experiment...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing to push large image layers #18800

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Failing to push large image layers #18800

viktor-f Jun 8, 2023

Replies: 4 comments · 4 replies

wy65701436 Jun 19, 2023 Maintainer

viktor-f Jun 19, 2023 Author

yunghollow91 May 2, 2024

yunghollow91 May 14, 2024

crazyelectron-io Jul 8, 2024

yunghollow91 Jul 8, 2024

nelsonrogers Nov 6, 2024

crazyelectron-io Nov 12, 2024

viktor-f
Jun 8, 2023

Replies: 4 comments 4 replies

wy65701436
Jun 19, 2023
Maintainer

viktor-f Jun 19, 2023
Author

yunghollow91
May 2, 2024

yunghollow91
May 14, 2024

crazyelectron-io
Jul 8, 2024