-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ECONNRESET: aborted when pushing large multi-container builds #2768
Comments
Getting the same issue here on Apple M2 Pro. |
Intermediary status update: This is still an issue for me. |
Intermediary status update: This is still an issue for me. |
A new bit of information emerges: It seems that when Balena CLI tells me that the build aborted due to a connection drop/reset, it still ends up on Balena Cloud. Seemingly it keeps running, but since I lose connection to the builder, I'm unable to see any logs. Release tags are also not applied (presumably because this happens after the build completes), so it's a weird half-state of kinda working, but not really. Would be nice if this behavior was consistent, seeing as it's one of the fundamental capabilities of the platform. I have not yet been able to test whether a build that completes in this manner is actually capable of running on a device or not. Edit: The 'phantom build' seems to be stuck in the 'Running' state forever and never finishes, so I guess that's not really useful. |
Intermediary status update: This is still an issue for me. I suspect this is an issue with the builder system, so making a PR to fix this behavior is next to impossible. If I can find some time I'll try and dig through the source code myself, though I expect it'll be challenging without any help and if it is in the build system itself, then we're kind powerless here. |
Hello @timwedde I am sorry you are having issues with our builders and yes, we have other people reporting similar on the forums. We currently use the builders for building our own docker images which are fairly large and we never faced this issue (our images have no priority, we just use the same build system as you do). So I don't think directly this is an issue with image sizes, but rather, something specific on a few docker composes that can cause the intermitency. I also finished running a script that did 100 pushes of different images (with different sizes) and I could not reproduce, is there anyway you have an example of dockercompose + resources where you can reproduce the behaviour? |
Yup, will work on creating a reproducible example that I am able to share, I'll post here again once I have something! Thanks for responding, much appreciated :) |
Sorry for the long absence, things got rather busy at work for a little bit so I didn't have time to work on an MWE for this. |
Expected Behavior
Pushing arbitrarily-sized multi-container builds to Balena builders works fine and creates a new image successfully.
Actual Behavior
When pushing large multi-container docker-compose files to the Balena builders, the push operations fails in about 90% of cases with the below error message:
The behavior is not consistent:
The command used to build is very simple:
balena push myFleet --release-tag description "debug" --draft
Here is one of the builds that failed, on the machine that has a slightly higher success rate:
Steps to Reproduce the Problem
Hard to say, I don't know if this is generally reproducible. This seems to occur with larger multi-container builds though.
My particular one is massive (in terms of final Docker image sizes at least), ending up at about 40-50GB. This is bad and I'm aware of that, but since I'm building for a Jetson and need multiple distinct containers that make use of GPU acceleration, I have to ship the entire driver stack several times, which bloats image sizes by a lot. I'm assuming I'm getting kicked off the builders because of cache or image sizes, but the error message is not clear about this nor could I find any hard limits on this, so I'm a bit confused as to the source of the issue.
Specifications
The text was updated successfully, but these errors were encountered: