Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't upgrade RKE2 errors stating "error syncing" ... #227

Open
edgrz opened this issue Feb 23, 2023 · 0 comments
Open

Can't upgrade RKE2 errors stating "error syncing" ... #227

edgrz opened this issue Feb 23, 2023 · 0 comments

Comments

@edgrz
Copy link

edgrz commented Feb 23, 2023

Version
Release 0.10.0 (https://github.com/rancher/system-upgrade-controller/releases/tag/v0.10.0)

Platform/Architecture
RKE2 cluster

Describe the bug
I'm using the system-upgrade-controller to bump RKE2 version from v1.23.10+rke2r1 to v1.24.10+rke2r1 (https://github.com/rancher/rke2/releases/tag/v1.24.10+rke2r1) following RKE2 documentation and everything works smooth if using the official images. However, when I try to run it on a disconnected environment where we don't have direct access to public repos, it doesn't work. What we have is a proxy cache implemented by harbor which allows us to pull from public repos. However, the job that spins up does the cordon, but then Jobs gets terminated probably by the controller.

If I check logs from system-upgrade-controller it's constantly complaining:

Error logs:

time="2023-02-23T17:01:29Z" level=error msg="error syncing 'system-upgrade/server-check11': handler system-upgrade-controller: DesiredSet - Replace Wait batch/v1, Kind=Job system-upgrade/apply-server-check11-on-node-with-8cb523f17d0363daa544-1c8f9 for system-upgrade-controller system-upgrade/server-check11, requeuing" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314"
time="2023-02-23T17:01:29Z" level=debug msg="PLAN STATUS HANDLER: plan=system-upgrade/server-check11@79747, status={Conditions:[{Type:LatestResolved Status:True LastUpdateTime:2023-02-23T17:01:28Z LastTransitionTime: Reason:Version Message:}] LatestVersion:v1.24.10-rke2r1 LatestHash:8cb523f17d0363daa5446f1aa3363b6a220e0e050435b4d3d40e253b Applying:[node]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314"
time="2023-02-23T17:01:29Z" level=debug msg="PLAN GENERATING HANDLER: plan=system-upgrade/server-check11@79765, status={Conditions:[{Type:LatestResolved Status:True LastUpdateTime:2023-02-23T17:01:29Z LastTransitionTime: Reason:Version Message:}] LatestVersion:v1.24.10-rke2r1 LatestHash:8cb523f17d0363daa5446f1aa3363b6a220e0e050435b4d3d40e253b Applying:[node]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314"
time="2023-02-23T17:01:30Z" level=debug msg="DesiredSet - Created batch/v1, Kind=Job system-upgrade/apply-server-check11-on-node-with-8cb523f17d0363daa544-1c8f9 for system-upgrade-controller system-upgrade/server-check11" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314"
time="2023-02-23T17:01:30Z" level=debug msg="PLAN STATUS HANDLER: plan=system-upgrade/server-check11@79765, status={Conditions:[{Type:LatestResolved Status:True LastUpdateTime:2023-02-23T17:01:29Z LastTransitionTime: Reason:Version Message:}] LatestVersion:v1.24.10-rke2r1 LatestHash:8cb523f17d0363daa5446f1aa3363b6a220e0e050435b4d3d40e253b Applying:[node]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314"
time="2023-02-23T17:01:30Z" level=debug msg="PLAN GENERATING HANDLER: plan=system-upgrade/server-check11@79782, status={Conditions:[{Type:LatestResolved Status:True LastUpdateTime:2023-02-23T17:01:30Z LastTransitionTime: Reason:Version Message:}] LatestVersion:v1.24.10-rke2r1 LatestHash:8cb523f17d0363daa5446f1aa3363b6a220e0e050435b4d3d40e253b Applying:[node]}" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314"
time="2023-02-23T17:01:31Z" level=debug msg="DesiredSet - Delete batch/v1, Kind=Job system-upgrade/apply-server-check11-on-node-with-8cb523f17d0363daa544-1c8f9 for system-upgrade-controller system-upgrade/server-check11" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314"
time="2023-02-23T17:01:31Z" level=error msg="error syncing 'system-upgrade/server-check11': handler system-upgrade-controller: DesiredSet - Replace Wait batch/v1, Kind=Job system-upgrade/apply-server-check11-on-node-with-8cb523f17d0363daa544-1c8f9 for system-upgrade-controller system-upgrade/server-check11, requeuing" func="github.com/sirupsen/logrus.(*Entry).Logf" file="/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:314"

So basically, it says level=error msg="error syncing 'system-upgrade/server-check11': handler system-upgrade-controller: DesiredSet - Replace Wait batch/v1, Kind=Job system-upgrade/apply-server-check11-on-node-with-8cb523f17d0363daa544-1c8f9 for system-upgrade-controller system-upgrade/server-check11, requeuing"

Jobs are permanently recreated and so pod:

❯ kubectl get pods
NAME                                                              READY   STATUS        RESTARTS   AGE
apply-server-check11-on-node-with-8cb523f17d0363daa544-h6vc6   0/1     Terminating   0          3s
apply-server-check11-on-node-with-8cb523f17d0363daa544-zjk2k   0/1     Pending       0          1s
system-upgrade-controller-957888bb5-vvv28                         1/1     Running       0          39m

To Reproduce
Our plan:

apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: server-check11
  namespace: system-upgrade
  labels:
    rke2-upgrade: server
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
       - {key: node-role.kubernetes.io/control-plane, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
#  drain:
#    force: true
  upgrade:
    image: our.private.registry/docker.io/rancher/rke2-upgrade
  version: v1.24.10+rke2r1

Expected behavior
It should the same it works when using official images, as the only change we do it's doing the proxy cache.

Actual behavior
The init container works as it cordons the node:

❯ kubectl get node
NAME      STATUS                     ROLES                              AGE    VERSION
node   Ready,SchedulingDisabled   control-plane,etcd,master,worker   177m   v1.23.10+rke2r1

However, the main container it's not even started. We don't even have time to check logs, but our guess is that there might be an issue with SHAs as it might internally do a check internally (system-upgrade-controller) as per SYSTEM_UPGRADE_PLAN_LATEST_HASH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant