Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-backing off call to https://webhook-service.projectsveltos.svc:443/convert?timeout=30s #1116

Open
cdunkelb opened this issue Feb 17, 2025 · 1 comment · May be fixed by #1168
Open

Non-backing off call to https://webhook-service.projectsveltos.svc:443/convert?timeout=30s #1116

cdunkelb opened this issue Feb 17, 2025 · 1 comment · May be fixed by #1168
Assignees
Labels
bug Something isn't working

Comments

@cdunkelb
Copy link

What happened:
Installed KCM 0.1.0 on K0s Server Version: v1.32.1+k0s. After system has settled, something is still calling https://webhook-service.projectsveltos.svc:443/convert?timeout=30s every second. This causes 100% of 1 vCPU to be used when the system is idle.

journalctl --since "1m ago" | grep -i webhook-service.projectsveltos.svc | grep -i E0217 | wc -l
59

Feb 17 18:20:54 ip-172-31-14-170 k0s[11008]: time="2025-02-17 18:20:54" level=info msg="W0217 18:20:54.762850   26115 reflector.go:569] storage/cacher.go:/config.projectsveltos.io/profiles: failed to list config.projectsveltos.io/v1alpha1, Kind=Profile: conversion webhook for config.projectsveltos.io/v1beta1, Kind=Profile failed: Post \"https://webhook-service.projectsveltos.svc:443/convert?timeout=30s\": service \"webhook-service\" not found" component=kube-apiserver stream=stderr
Feb 17 18:20:54 ip-172-31-14-170 k0s[11008]: time="2025-02-17 18:20:54" level=info msg="E0217 18:20:54.762872   26115 cacher.go:478] cacher (profiles.config.projectsveltos.io): unexpected ListAndWatch error: failed to list config.projectsveltos.io/v1alpha1, Kind=Profile: conversion webhook for config.projectsveltos.io/v1beta1, Kind=Profile failed: Post \"https://webhook-service.projectsveltos.svc:443/convert?timeout=30s\": service \"webhook-service\" not found; reinitializing..." component=kube-apiserver stream=stderr

What you expected to happen:
Whatever is calling the service should exponentially back-off or stop. OR the sveltos webhook-service should be deployed. It doesn't look like the webhook-service is deployed

ubuntu@ip-172-31-14-170:~$ kubectl get service -A  | grep -i webhook
kcm-system       azureserviceoperator-webhook-service                          ClusterIP   10.106.162.39    <none>        443/TCP                  4d1h
kcm-system       capa-webhook-service                                          ClusterIP   10.98.132.52     <none>        443/TCP                  4d1h
kcm-system       capi-operator-webhook-service                                 ClusterIP   10.106.0.11      <none>        443/TCP                  4d1h
kcm-system       capi-webhook-service                                          ClusterIP   10.108.254.154   <none>        443/TCP                  4d1h
kcm-system       capo-webhook-service                                          ClusterIP   10.108.199.178   <none>        443/TCP                  4d1h
kcm-system       capv-webhook-service                                          ClusterIP   10.108.130.81    <none>        443/TCP                  4d1h
kcm-system       capz-webhook-service                                          ClusterIP   10.96.211.117    <none>        443/TCP                  4d1h
kcm-system       kcm-cert-manager-webhook                                      ClusterIP   10.109.104.165   <none>        443/TCP,9402/TCP         4d1h
kcm-system       kcm-webhook-service                                           ClusterIP   10.107.89.11     <none>        443/TCP                  4d1h

Where can this issue be corrected? (optional)
Two fixes need to be made. Fix the call so it does not fail. Fix the call so it exponentially backs off if the service is down to not impact CPU so much.

How to reproduce it (as minimally and precisely as possible):
Happens on basic install. https://docs.k0rdent.io/latest/quickstart-1-mgmt-node-and-cluster/

Anything else we need to know?:

@cdunkelb cdunkelb added the bug Something isn't working label Feb 17, 2025
@DinaBelova DinaBelova transferred this issue from k0rdent/k0rdent Feb 24, 2025
@github-project-automation github-project-automation bot moved this to Todo in k0rdent Feb 24, 2025
@zerospiel
Copy link
Contributor

The issue should be gone with bumping the version due to the drop of the v1alpha1. The RC was not setting the .Values.webhook.conversion, which led to dropping the required Deployment with the webhook from the installation.

Also, the chart has been divided into several, including a dedicated one with the CRDs, so it is highly likely it is no longer required to maintain our makefile approach separating CRDs.

JFYI @BROngineer

@zerospiel zerospiel self-assigned this Mar 3, 2025
@zerospiel zerospiel moved this from Todo to In Progress in k0rdent Mar 3, 2025
@zerospiel zerospiel linked a pull request Mar 5, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants