-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: al2 support #119
base: master
Are you sure you want to change the base?
feat: al2 support #119
Conversation
fix indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @szab100 for the contribution!
Looks good in general, just a couple of comments.
k8s/scripts/kubelet-config-helper.sh
Outdated
@@ -1476,6 +1488,7 @@ function do_config_kubelet() { | |||
clean_cgroups_kubepods | |||
config_kubelet "host-based" | |||
adjust_crio_config_dependencies | |||
restart_containerd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious on why restarting containerd is needed; it's non-obvious to me because we are not changing any containerd config, and after this script switches kubelet to use CRI-O, containerd will no longer be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i'm not quite sure, I know this was needed at some point (may be AL2 specific), but unsure if it was finally needed or not.. it shouldn't hurt though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to leave it out, since there's no change to any containerd config in the script, so restarting containerd is confusing I think; unless it's actually needed of course, in which case a comment as to why would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it helps on the sysbox uninstall from the cluster, where we revert back from CRI-O -> containerd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ctalledo, I made a new build today on top of sysbox v0.6.3 but applying the new sysbox-runc patch + this PR + reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October) and I can confirm that this line (restarting containerd) seems to be needed. Without this, the installer daemonset finishes with the first run and the node just fails (the kubelet is down).
Upon SSH-ing to the failing node, I see the following issue:
[root@ip-10-194-243-117 user]# systemctl status kubelet-config-helper.service
● kubelet-config-helper.service - Kubelet config service
Loaded: loaded (/usr/lib/systemd/system/kubelet-config-helper.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Thu 2024-02-01 20:53:48 UTC; 14min ago
Process: 24294 ExecStart=/bin/sh -c /usr/local/bin/kubelet-config-helper.sh (code=exited, status=1/FAILURE)
Main PID: 24294 (code=exited, status=1/FAILURE)
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + systemctl restart crio
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + restart_kubelet
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + echo 'Restarting Kubelet ...'
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: Restarting Kubelet ...
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + systemctl restart kubelet
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal sh[24294]: A dependency job for kubelet.service failed. See 'journalctl -xe' for details.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: kubelet-config-helper.service: main process exited, code=exited, status=1/FAILURE
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Failed to start Kubelet config service.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Unit kubelet-config-helper.service entered failed state.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: kubelet-config-helper.service failed.
[root@ip-10-194-243-117 user]# systemctl status kubelet.service
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubelet-args.conf, 30-kubelet-extra-args.conf
Active: inactive (dead) since Thu 2024-02-01 20:52:13 UTC; 20min ago
Docs: https://github.com/kubernetes/kubernetes
Main PID: 6995 (code=exited, status=0/SUCCESS)
...
Feb 01 20:52:13 ip-10-194-243-117.vpc.internal kubelet[6995]: I0201 20:52:13.113055 6995 kubelet.go:2132] "SyncLoop (PLEG): event for pod" pod="addon-active-monitor-ns/aws-asg-activities-healthcheck-workflow...3aa88fa506f6a}
Feb 01 20:52:13 ip-10-194-243-117.vpc.internal systemd[1]: Stopping Kubernetes Kubelet...
Feb 01 20:52:13 ip-10-194-243-117.vpc.internal kubelet[6995]: I0201 20:52:13.564307 6995 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Feb 01 20:52:13 ip-10-194-243-117.vpc.internal systemd[1]: Stopped Kubernetes Kubelet.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Dependency failed for Kubernetes Kubelet.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Job kubelet.service/start failed with result 'dependency'.
Hint: Some lines were ellipsized, use -l to show in full.
This is probably because the kubelet systemd unit has containerd as a dependency (After & Requires):
[root@ip-10-194-243-117 user]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service sandbox-image.service
Requires=containerd.service sandbox-image.service
...
So I added a new sed
command to replace any potential dependencies on 'containerd' with 'crio' to fix it. But even after this replacement, followed by the systemctl daemon-reload
command at the end of the config_kubelet() function, reloading kubelet still fails until the containerd
service is stopped. That is why adding service containerd restart
helped, but I just replaced it with a call to the existing stop_containerd()
func instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @szab100, thank you very much for the detailed explanation, let's keep it there (and possibly add a short comment that is needed).
reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October)
Oh I had missed that, my mistake; we can bring it back then (it's a pretty simple change in the Makefiles IIRC), but I don't think we can actively test on it (but since it's been tested before, things should continue to work).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @szab100, thank you very much for the detailed explanation, let's keep it there (and possibly add a short comment that is needed).
reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October)
Oh I had missed that, my mistake; we can bring it back then (it's a pretty simple change in the Makefiles IIRC), but I don't think we can actively test on it (but since it's been tested before, things should continue to work).
It's fine, actually in the meantime we upgraded to 1.26.6 as well, so no need to bring it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment please.
else | ||
kubelet_env_file=$(echo "$kubelet_env_files" | awk '{print $NF}') | ||
fi | ||
|
||
backup_config "$kubelet_env_file" "kubelet_env_file" | ||
|
||
# Replace potential dependencies on 'containerd' with 'crio' | ||
sed -i "s/containerd.service/crio.service/" /etc/systemd/system/kubelet.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the sysbox-deploy-k8s uninstall, don't we need to "undo" this change? Maybe create a copy of the kubelet.service, and then revert to it during uninstall.
Adding Amazon Linux 2 support.
Notes:
amazon-linux-extras
RPM source)Known / pending issues:
Auto K8S ServiceAccount secret mounts using
tmpfs
are not working (see workaround below).The resulting error is like
Workaround: These auto-mounted SA Tokens failing to be mounted to the default
/var/run/secrets/..
location, however, by disabling the auto-mount option on the Pod and adding the volume and volumeMount entries manually with a different mountPath (like/secrets/...
) should work: