-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support per-component tolerations (+ manager) #154
base: main
Are you sure you want to change the base?
Support per-component tolerations (+ manager) #154
Conversation
@@ -1272,6 +1272,9 @@ manager: | |||
podLabels: {} | |||
service: | |||
name: | |||
# When true, root tolerations aren't added to this one. | |||
tolerationsReplaced: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not be very intuitive to set another boolean just to express that the tolerations at the specific component level should be respected.
Could we instead have the tolerations
commented out at the specific component level, such as:
# tolerations: [] # Uncomment this line to override root-level tolerations.
Once you merge from latest main, you will notice the root tolerations have been defined & only apply to daemonsets and do NOT apply to the controller manager deployment. So just for controller manager alone, we truly do want to explicitly set tolerations: []
and let it override the root level ones to maintain compatibility i.e. not comment that line.
We can then probably have logic that does something like:
{{- if hasKey .Values.manager "tolerations" -}}
{{- toYaml .Values.manager.tolerations -}}
{{- else -}}
{{- toYaml .Values.tolerations -}}
{{- end -}}
{{- end -}}
Could probably put this in a shared method in _helpers.tpl and call it for the all components.
The components to cover would be:
- manager:
- containerLogs.fluentBit
- agent
- dcgmExporter
- neuronMonitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sky333999,
Thank you for the feedback!
The PR has been designed to be conservative to avoid unexpected impact for existing deployment. I will check new code changes and update my PR accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR based on:
- Root tolerations is applied by default
- Manager tolerations is disabled by default (
[]
value, see below) - Each component may override tolerations
- Tolerations can be disabled by using empty list
[]
(root or specific) - If specific tolerations is
null
(nil
in Go/Template), root ones are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats great! Thank you so much for the changes. Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it a bit more, you could probably get away with something like:
{{- with (.Values.agent.tolerations | default .Values.tolerations ) }}
tolerations: {{- toYaml . | nindent 2}}
{{- end }}
Im debating if this is just simpler for each of the components instead of the helper function. Im fine either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I miss your two last review feedbacks.
Because nil
and empty ([]
) are handled differently, default
function won't do what is expected.
26ec8ef
to
260e688
Compare
Should partially fix aws/containers-roadmap#2515
260e688
to
238c132
Compare
Thanks for the updates @loganmzz . Would you mind adding some testing information to the description so we have that captured before I merge this in?
|
For sure here the values I have tested on: # 00 - Empty # 01 - Disable all tolerations with empty
tolerations: [] # 02 - Disable all tolerations with null
tolerations: # 03 - Set manager tolerations
manager:
tolerations:
- key: "specific"
operator: "Equal"
value: "manager" # 04 - Set manager tolerations to default
manager:
tolerations: # 05 - Set agent tolerations
agent:
tolerations:
- key: "specific"
operator: "Equal"
value: "agent" # 06 - Disable agent tolerations
agent:
tolerations: [] |
Should partially fix aws/containers-roadmap#2515
Issue #, if available:
aws/containers-roadmap#2515
Description of changes:
manager
Behavior
[]
value, see below)[]
(root or specific)null
(nil
in Go/Template), root ones are used.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.