Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: al2 support #119

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions k8s/scripts/crio-extractor.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ function install_crictl() {
}

function install_crio() {
mkdir -p ${ETCDIR}/crio
mkdir -p ${OCIDIR}
install ${SELINUX} -d -m 755 ${BASHINSTALLDIR}
install ${SELINUX} -d -m 755 ${FISHINSTALLDIR}
install ${SELINUX} -d -m 755 ${ZSHINSTALLDIR}
Expand Down
15 changes: 14 additions & 1 deletion k8s/scripts/kubelet-config-helper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,11 @@ function start_containerd() {
systemctl start containerd.service
}

function restart_containerd() {
echo "Restarting containerd on the host ..."
systemctl restart containerd.service
}

function stop_containerd() {
echo "Stopping containerd on the host ..."
systemctl stop containerd.service
Expand Down Expand Up @@ -476,7 +481,7 @@ function adjust_kubelet_exec_instruction() {
fi
fi

echo $new_line >>tmp.txt
echo "$new_line" >>tmp.txt

done <"$systemd_file"

Expand Down Expand Up @@ -539,6 +544,7 @@ function config_kubelet() {
if [[ "$kubelet_env_files" == "" ]]; then
kubelet_env_file="/etc/default/kubelet"
touch "$kubelet_env_file"
sed -i "/^[Service]/a EnvironmentFile=/etc/default/kubelet" "$systemd_file"
else
kubelet_env_file=$(echo "$kubelet_env_files" | awk '{print $NF}')
fi
Expand Down Expand Up @@ -782,6 +788,12 @@ function adjust_crio_config_dependencies() {
# present.
pause_image=$(echo $pause_image | sed 's/:.*@/@/')

# Workaround: the default EKS pause images require auth, use the public ecr pause image url instead with same tag
if [[ "$pause_image" =~ "ecr" ]]; then
echo "Use default pause_image.."
pause_image=$(echo $pause_image | sed 's|.*:|public.ecr.aws/eks-distro/kubernetes/pause:|')
fi

# Adjust crio.conf with kubelet's 'pause-image' attribute.
if egrep -q "pause_image =" $crio_conf_file; then
sed -i "s@pause_image =.*@pause_image = \"${pause_image}\"@" $crio_conf_file
Expand Down Expand Up @@ -1476,6 +1488,7 @@ function do_config_kubelet() {
clean_cgroups_kubepods
config_kubelet "host-based"
adjust_crio_config_dependencies
restart_containerd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious on why restarting containerd is needed; it's non-obvious to me because we are not changing any containerd config, and after this script switches kubelet to use CRI-O, containerd will no longer be used.

Copy link
Author

@szab100 szab100 Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'm not quite sure, I know this was needed at some point (may be AL2 specific), but unsure if it was finally needed or not.. it shouldn't hurt though

Copy link
Member

@ctalledo ctalledo Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to leave it out, since there's no change to any containerd config in the script, so restarting containerd is confusing I think; unless it's actually needed of course, in which case a comment as to why would be helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it helps on the sysbox uninstall from the cluster, where we revert back from CRI-O -> containerd.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ctalledo, I made a new build today on top of sysbox v0.6.3 but applying the new sysbox-runc patch + this PR + reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October) and I can confirm that this line (restarting containerd) seems to be needed. Without this, the installer daemonset finishes with the first run and the node just fails (the kubelet is down).

Upon SSH-ing to the failing node, I see the following issue:

[root@ip-10-194-243-117 user]# systemctl status kubelet-config-helper.service
● kubelet-config-helper.service - Kubelet config service
   Loaded: loaded (/usr/lib/systemd/system/kubelet-config-helper.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2024-02-01 20:53:48 UTC; 14min ago
  Process: 24294 ExecStart=/bin/sh -c /usr/local/bin/kubelet-config-helper.sh (code=exited, status=1/FAILURE)
 Main PID: 24294 (code=exited, status=1/FAILURE)

Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + systemctl restart crio
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + restart_kubelet
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + echo 'Restarting Kubelet ...'
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: Restarting Kubelet ...
Feb 01 20:52:18 ip-10-194-243-117.vpc.internal sh[24294]: + systemctl restart kubelet
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal sh[24294]: A dependency job for kubelet.service failed. See 'journalctl -xe' for details.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: kubelet-config-helper.service: main process exited, code=exited, status=1/FAILURE
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Failed to start Kubelet config service.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Unit kubelet-config-helper.service entered failed state.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: kubelet-config-helper.service failed.
[root@ip-10-194-243-117 user]# systemctl status kubelet.service
● kubelet.service - Kubernetes Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubelet-args.conf, 30-kubelet-extra-args.conf
   Active: inactive (dead) since Thu 2024-02-01 20:52:13 UTC; 20min ago
     Docs: https://github.com/kubernetes/kubernetes
 Main PID: 6995 (code=exited, status=0/SUCCESS)
...
Feb 01 20:52:13 ip-10-194-243-117.vpc.internal kubelet[6995]: I0201 20:52:13.113055    6995 kubelet.go:2132] "SyncLoop (PLEG): event for pod" pod="addon-active-monitor-ns/aws-asg-activities-healthcheck-workflow...3aa88fa506f6a}
Feb 01 20:52:13 ip-10-194-243-117.vpc.internal systemd[1]: Stopping Kubernetes Kubelet...
Feb 01 20:52:13 ip-10-194-243-117.vpc.internal kubelet[6995]: I0201 20:52:13.564307    6995 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Feb 01 20:52:13 ip-10-194-243-117.vpc.internal systemd[1]: Stopped Kubernetes Kubelet.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Dependency failed for Kubernetes Kubelet.
Feb 01 20:53:48 ip-10-194-243-117.vpc.internal systemd[1]: Job kubelet.service/start failed with result 'dependency'.
Hint: Some lines were ellipsized, use -l to show in full.

This is probably because the kubelet systemd unit has containerd as a dependency (After & Requires):

[root@ip-10-194-243-117 user]# cat /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service sandbox-image.service
Requires=containerd.service sandbox-image.service
...

So I added a new sed command to replace any potential dependencies on 'containerd' with 'crio' to fix it. But even after this replacement, followed by the systemctl daemon-reload command at the end of the config_kubelet() function, reloading kubelet still fails until the containerd service is stopped. That is why adding service containerd restart helped, but I just replaced it with a call to the existing stop_containerd() func instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @szab100, thank you very much for the detailed explanation, let's keep it there (and possibly add a short comment that is needed).

reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October)

Oh I had missed that, my mistake; we can bring it back then (it's a pretty simple change in the Makefiles IIRC), but I don't think we can actively test on it (but since it's been tested before, things should continue to work).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @szab100, thank you very much for the detailed explanation, let's keep it there (and possibly add a short comment that is needed).

reverting the change that deprecated v1.25 (we are still on that, it is officially supported by AWS/EKS until '24 October)

Oh I had missed that, my mistake; we can bring it back then (it's a pretty simple change in the Makefiles IIRC), but I don't think we can actively test on it (but since it's been tested before, things should continue to work).

It's fine, actually in the meantime we upgraded to 1.26.6 as well, so no need to bring it back.

restart_kubelet
fi
}
Expand Down
8 changes: 5 additions & 3 deletions k8s/scripts/sysbox-deploy-k8s.sh
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,8 @@ function get_artifacts_dir() {
[[ "$distro" == "ubuntu-21.10" ]] ||
[[ "$distro" == "ubuntu-20.04" ]] ||
[[ "$distro" == "ubuntu-18.04" ]] ||
[[ "$distro" =~ "debian" ]]; then
[[ "$distro" =~ "debian" ]] ||
[[ "$distro" == "amzn-2" ]]; then
artifacts_dir="${sysbox_artifacts}/bin/generic"
elif [[ "$distro" =~ "flatcar" ]]; then
local release=$(echo $distro | cut -d"-" -f2)
Expand Down Expand Up @@ -325,7 +326,7 @@ function apply_conf() {

# Note: this requires CAP_SYS_ADMIN on the host
echo "Configuring host sysctls ..."
sysctl -p "${host_sysctl}/99-sysbox-sysctl.conf"
sysctl -ep "${host_sysctl}/99-sysbox-sysctl.conf"
}

function start_sysbox() {
Expand Down Expand Up @@ -671,7 +672,7 @@ function get_container_runtime() {
}

function get_host_distro() {
local distro_name=$(grep -w "^ID" "$host_os_release" | cut -d "=" -f2)
local distro_name=$(grep -w "^ID" "$host_os_release" | cut -d "=" -f2| cut -d'"' -f 2)
local version_id=$(grep -w "^VERSION_ID" "$host_os_release" | cut -d "=" -f2 | tr -d '"')
echo "${distro_name}-${version_id}"
}
Expand Down Expand Up @@ -709,6 +710,7 @@ function is_supported_distro() {
[[ "$distro" == "ubuntu-21.10" ]] ||
[[ "$distro" == "ubuntu-20.04" ]] ||
[[ "$distro" == "ubuntu-18.04" ]] ||
[[ "$distro" == "amzn-2" ]] ||
[[ "$distro" =~ "debian" ]] ||
[[ "$distro" =~ "flatcar" ]]; then
return
Expand Down
26 changes: 20 additions & 6 deletions k8s/scripts/sysbox-installer-helper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ set -o pipefail
set -o nounset

shiftfs_dkms=/run/shiftfs-dkms
apt=$(which apt-get || true)
yum=$(which yum || true)

function die() {
msg="$*"
Expand All @@ -34,10 +36,14 @@ function die() {

function install_package_deps() {

# Certificates package is required prior to running apt-update.
apt-get -y install ca-certificates
apt-get update
apt-get install -y rsync fuse iptables
if [[ ! -z $apt ]]; then
# Certificates package is required prior to running apt-update.
apt-get -y install ca-certificates
apt-get update
apt-get install -y rsync fuse iptables
elif [[ ! -z $yum ]]; then
yum install -y rsync fuse iptables
fi
}

function install_shiftfs() {
Expand All @@ -55,15 +61,23 @@ function install_shiftfs() {

echo "Installing Shiftfs ..."

apt-get install -y make dkms
if [[ ! -z $apt ]]; then
apt-get install -y make dkms
elif [[ ! -z $yum ]]; then
yum install -y make dkms
fi

sh -c "cd $shiftfs_dkms && make -f Makefile.dkms"

if ! shiftfs_installed; then
echo "Shiftfs installation failed!"
return
fi

apt-get remove --purge -y make dkms
if [[ ! -z $apt ]]; then
apt-get remove --purge -y make dkms
fi

echo "Shiftfs installation done."
}

Expand Down