-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-54611: pkg/operator/status: Drop kubelet skew guard #4970
OCPBUGS-54611: pkg/operator/status: Drop kubelet skew guard #4970
Conversation
@wking: This pull request references Jira Issue OCPBUGS-54611, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
cb44bcb
to
2c9e7d6
Compare
The kubelet skew guard is from 1471d2c (Bug 1986453: Check for API server and node versions skew, 2021-07-27, openshift#2658). But the Kube API server also landed a similar guard in openshift/cluster-kube-apiserver-operator@9ce4f74775 (add KubeletVersionSkewController, 2021-08-26, openshift/cluster-kube-apiserver-operator#1199). openshift/enhancements@0ba744e750 (eus-upgrades-mvp: don't enforce skew check in MCO, 2021-04-29, openshift/enhancements#762) had shifted the proposal from MCO-guards to KAS-guards, so I'm not clear on why the MCO guard landed. This commit drops it, to consolidate around the KAS-side guard.
2c9e7d6
to
0c21907
Compare
unit test-case doesn't have much context on what went wrong:
bootstrap-unit seems to have choked on an InsightsDataGatherer CRD:
Neither seems related to my change, so retesting both: /test unit |
/retest-required |
/lgtm Seems sane to me /hold Holding for pre merge QE |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@wking: This pull request references Jira Issue OCPBUGS-54611, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Pre-merge verification: Created Custom MCP Infra pool template$ oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} nodeSelector: matchLabels: node-role.kubernetes.io/infra: "" EOF machineconfigpool.machineconfiguration.openshift.io/infra created Applied below MC to apply 4.17 rhcos To get the rhcos image from 4.17 cluster $ oc adm release info --image-for rhel-coreos quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6074e4da57d9e290d18928335a308762ca3ef840c12ca0d81a186868656d7956 $ oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: infra name: os-layer-custom spec: osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6074e4da57d9e290d18928335a308762ca3ef840c12ca0d81a186868656d7956 EOF machineconfig.machineconfiguration.openshift.io/os-layer-custom created Check the osImage is applied properly $ oc describe mc rendered-infra-74f50497da2b65ef722900dae96812a9 | tail -n 10 Name: kubelet-cleanup.service Extensions: Fips: false Kernel Arguments: systemd.unified_cgroup_hierarchy=1 cgroup_no_v1="all" psi=0 Kernel Type: default Os Image URL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:6074e4da57d9e290d18928335a308762ca3ef840c12ca0d81a186868656d7956 Events: After mcp update is complete check in CO for $ oc get co -o yaml ... - apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: exclude.release.openshift.io/internal-openshift-hosted: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" creationTimestamp: "2025-04-08T06:59:28Z" generation: 1 name: kube-apiserver ..... status: conditions: - lastTransitionTime: "2025-04-08T10:22:49Z" message: 'KubeletMinorVersionUpgradeable: Unsupported kubelet minor version (1.30.11) on node ip-10-0-21-153.us-east-2.compute.internal is too far behind the target API server version (1.32.3).' reason: KubeletMinorVersion_KubeletMinorVersionUnsupported status: "False" type: Upgradeable .... version: 4.19.0-0.test-2025-04-08-063432-ci-ln-b1nzigk-latest - apiVersion: config.openshift.io/v1 kind: ClusterOperator metadata: annotations: exclude.release.openshift.io/internal-openshift-hosted: "true" include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" creationTimestamp: "2025-04-08T06:59:28Z" generation: 1 name: machine-config ...... status: conditions: ..... - lastTransitionTime: "2025-04-08T07:04:43Z" reason: AsExpected status: "True" type: Upgradeable |
/label qe-approved |
/unhold |
1 similar comment
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
ecc0b92
into
openshift:main
@wking: Jira Issue OCPBUGS-54611: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-54611 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-machine-config-operator |
The kubelet skew guard is from 1471d2c (#2658). But the Kube API server also landed a similar guard in
openshift/cluster-kube-apiserver-operator@9ce4f74775 (openshift/cluster-kube-apiserver-operator#1199).
openshift/enhancements@0ba744e750 (openshift/enhancements#762) had shifted the proposal from MCO-guards to KAS-guards, so I'm not clear on why the MCO guard landed. This pull request drops it, to consolidate around the KAS-side guard.
Closes: OCPBUGS-54611
- What I did
Dropped the MCO's kubelet-skew guard, because the Kube API server's kubelet-skew guard is enough on its own.
- How to verify it
kube-apiserver
ClusterOperator will beUpgradeable=False
, andoc adm upgrade
will point out the old Nodes. Without this fix, themachine-config
ClusterOperator will beUpgradeable=True
, but will have areason
andmessage
complaining about the old Node versions. With this fix, themachine-config
ClusterOperator will still beUpgradeable=True
, but will not complain about the old-kubelet/Node complaints.- Description for the changelog
Probably not worth a change-log entry, because so few folks are likely to bump up against these skew constraints, and we're not adding or removing a guard at the OCP level, just simplifying the OCP-scoped messaging by consolidating around the KAS-side wording.