[enhancement] Provide a automated way to renew the token if existing token expires for sveltoscluster mgmt object. #995

jasvinder1107 · 2025-01-30T18:04:37Z

Use-Case: If management cluster has some maintenance activity or hardware issue causing it go down for more than 3 hours. The token for mgmt sveltoscluster object expires. This lead to continuous errors and failure to reconcile the management cluster seveltoscluster object. The documentation states that if the token is expired the tokenRequestRenewalOption object is useless.
https://projectsveltos.github.io/sveltos/register/token-renewal/

If, for any reason, token rotation cannot happen before the current token expires, the sveltoscluster-manager can no longer update the token. Consequently, reconciliations for that cluster stop, and you must manually update the Secret for that cluster to restore functionality.
2025-01-30T17:22:02.617497+00:00 bigbender k0s[1216]: time="2025-01-30 17:22:02" level=info msg="E0130 17:22:02.617216    1290 authentication.go:73] \"Unable to authenticate the request\" err=\"[invalid bearer token, service account token has expired]\"" component=kube-apiserver stream=stderr
2025-01-30T17:22:02.645808+00:00 bigbender k0s[1216]: time="2025-01-30 17:22:02" level=info msg="E0130 17:22:02.645588    1290 authentication.go:73] \"Unable to authenticate the request\" err=\"[invalid bearer token, service account token has expired]\"" component=kube-apiserver stream=stderr
2025-01-30T17:22:02.684750+00:00 bigbender k0s[1216]: time="2025-01-30 17:22:02" level=info msg="E0130 17:22:02.684503    1290 authentication.go:73] \"Unable to authenticate the request\" err=\"[invalid bearer token, service account token has expired]\"" component=kube-apiserver stream=stderr
2025-01-30T17:22:02.741968+00:00 bigbender k0s[1216]: time="2025-01-30 17:22:02" level=info msg="E0130 17:22:02.741191    1290 authentication.go:73] \"Unable to authenticate the request\" err=\"[invalid bearer token, service account token has expired]\"" component=kube-apiserver stream=stderr
2025-01-30T17:22:02.781249+00:00 bigbender k0s[1216]: time="2025-01-30 17:22:02" level=info msg="E0130 17:22:02.780966    1290 authentication.go:73] \"Unable to authenticate the request\" err=\"[invalid bearer token, service account token has expired]\"" component=kube-apiserver stream=stderr

The job responsible for this cluster object has hard-coded timeout set for first token as 3 hours and consecutive renewal as 1 hour
https://github.com/projectsveltos/register-mgmt-cluster/blob/8c1dacccf04c932566bce543e4583a612bd5962e/cmd/main.go#L439C1-L439C65
https://github.com/projectsveltos/register-mgmt-cluster/blob/8c1dacccf04c932566bce543e4583a612bd5962e/cmd/main.go#L120

The jobs also has https://github.com/projectsveltos/helm-charts/blob/b2652715ee7b3e373705941f45e9294007a2f2f7/charts/projectsveltos/templates/register-mgmt-cluster-job.yaml#L36 ttlSecondsAfterFinished: 240 set which mean after 4 mins the job will be gone. This makes no room for customer to fix this issue.

root@bigbender:~/projectsveltos# kubectl get sveltoscluster -n mgmt  -o yaml
apiVersion: v1
items:
- apiVersion: lib.projectsveltos.io/v1beta1
  kind: SveltosCluster
  metadata:
    creationTimestamp: "2025-01-23T20:33:09Z"
    generation: 14
    labels:
      projectsveltos.io/k8s-version: v1.31.5
      sveltos-agent: present
    name: mgmt
    namespace: mgmt
    resourceVersion: "1114505"
    uid: e1fd1a1b-f7e8-4f86-8e6c-bb938d282c4c
  spec:
    consecutiveFailureThreshold: 3
    kubeconfigKeyName: re-kubeconfig
    tokenRequestRenewalOption:
      renewTokenRequestInterval: 1h0m0s
  status:
    connectionFailures: 48804
    connectionStatus: Down
    failureMessage: 'failed to get API group resources: unable to retrieve the complete
      list of server APIs: v1: Unauthorized'
    lastReconciledTokenRequestAt: "2025-01-29T21:25:46Z"
    ready: true
    version: v1.31.5+k0s
kind: List
metadata:
  resourceVersion: ""
root@bigbender:~/projectsveltos#

You can see the failureMessage there is "failed to get API group resources: unable to retrieve the complete"

Possible solutions:

1.Handle this as a cronjob build a bash script and run a seperate job in https://github.com/k0rdent/kcm/tree/main/templates/provider/projectsveltos
2. Change in upstream helm chart and maintaning it under K0rdent , doesn't sound like a good idea.
3. I am Open to discussion on this to have seperate pod and go code to handle this.

Workaround currently is:

Get the current kubeconfig from secret.

 kubectl get secret -n mgmt mgmt-sveltos-kubeconfig -o json |jq '.data."re-kubeconfig"' |tr -d "\"" |base64 -d  > file1;

Run sveltosctl to generate kubeconfig. Copy the generated token and replace it in file1. Reason for not saving whole file and just taking a token is apisever ip generated by this command is localhost:6443 . Just wanted to update, what is minimum required. The actual kubeconfig generated by cluster object has service IP.

 root@bigbender:~# sveltosctl  generate kubeconfig --create --expirationSeconds=86400 |grep -i token

Run below patch command to fix the token

kubectl patch secret -n mgmt mgmt-sveltos-kubeconfig --patch="{\"data\": { \"re-kubeconfig\": \"$(base64 -w0 ./file1)\" }}"

If you can see things will start working and no more token expire messages filling up system logs.

 
root@bigbender:~# kubectl patch secret -n mgmt mgmt-sveltos-kubeconfig --patch="{\"data\": { \"re-kubeconfig\": \"$(base64 -w0 ./file1)\" }}"
secret/mgmt-sveltos-kubeconfig patched
root@bigbender:~#
root@bigbender:~#
root@bigbender:~# kubectl get sveltoscluster -A
NAMESPACE   NAME   READY   VERSION
mgmt        mgmt   true    v1.31.5+k0s
root@bigbender:~# kubectl get sveltoscluster -n mgmt -o yaml
apiVersion: v1
items:
- apiVersion: lib.projectsveltos.io/v1beta1
  kind: SveltosCluster
  metadata:
    creationTimestamp: "2025-01-23T20:33:09Z"
    generation: 14
    labels:
      projectsveltos.io/k8s-version: v1.31.5
      sveltos-agent: present
    name: mgmt
    namespace: mgmt
    resourceVersion: "1131771"
    uid: e1fd1a1b-f7e8-4f86-8e6c-bb938d282c4c
  spec:
    consecutiveFailureThreshold: 3
    kubeconfigKeyName: re-kubeconfig
    tokenRequestRenewalOption:
      renewTokenRequestInterval: 1h0m0s
  status:
    connectionStatus: Healthy
    lastReconciledTokenRequestAt: "2025-01-30T17:46:16Z"
    ready: true
    version: v1.31.5+k0s
kind: List
metadata:
  resourceVersion: ""
root@bigbender:~#

The text was updated successfully, but these errors were encountered:

jasvinder1107 added the enhancement Small feature, request or improvement suggestion label Jan 30, 2025

github-project-automation bot added this to Project 2A Jan 30, 2025

github-project-automation bot moved this to Todo in Project 2A Jan 30, 2025

jasvinder1107 changed the title ~~[enhancement] Provide a automated way to renew the token if existing token expires fore sveltoscluster mgmt object.~~ [enhancement] Provide a automated way to renew the token if existing token expires for sveltoscluster mgmt object. Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[enhancement] Provide a automated way to renew the token if existing token expires for sveltoscluster mgmt object. #995

[enhancement] Provide a automated way to renew the token if existing token expires for sveltoscluster mgmt object. #995

jasvinder1107 commented Jan 30, 2025 •

edited

Loading

[enhancement] Provide a automated way to renew the token if existing token expires for sveltoscluster mgmt object. #995

[enhancement] Provide a automated way to renew the token if existing token expires for sveltoscluster mgmt object. #995

Comments

jasvinder1107 commented Jan 30, 2025 • edited Loading

jasvinder1107 commented Jan 30, 2025 •

edited

Loading