-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node-installer: remove resource limits #948
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still set a request based on the average case, but removing the limit to accommodate the worst case sgtm.
There's no strong reason for having resource limits on the node-installer and given the unpredictable nature of the kernel's resource accounting, we should remove the resource limits so that we don't run into unexpected OOMs. Fixes f5da52d
677d971
to
3c5722d
Compare
https://home.robusta.dev/blog/kubernetes-memory-limit cc @3u13r |
Ah, so I remembered correctly. I was just not sure to bring it up. Ideally the node-installer is more akin to a (daemon set) job than a application deployment which actually does something with the requested memory, right? |
This may be true for apps, but is not for essential system components. We don't want to compete with normal pods for resources. |
I'd say that this is a rather subjective measurement of what is "essential". There could be other important applications running in the same cluster. The concept of contrast is a "next to", so we shouldn't think we're the most critical component. |
The problem is that we need memory in a single burst and don't have much influence over how it's accounted. Setting a request=limit=1G is a little extreme and I'd say a small request with a large limit may be fair, too, but don't quite see what the limit would buy us. |
Funny thing: the linked blog post even has a section on our issue, I think (Unintuitive page-cache behaviour can lead to unecessary OOMs). |
Okay, just dropping some of the links here again for quicker access, fine to merge as is. |
/backport |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release/v1.1
git worktree add -d .worktree/backport-948-to-release/v1.1 origin/release/v1.1
cd .worktree/backport-948-to-release/v1.1
git switch --create backport-948-to-release/v1.1
git cherry-pick -x 7f50e0cc566e630bc30775e44b0a42c31b924360 |
There's no strong reason for having resource limits on the node-installer and given the unpredictable nature of the kernel's resource accounting, we should remove the resource limits so that we don't run into unexpected OOMs.
Fixes f5da52d
Cc @blenessy