[CI/CD] cloudbank helm upgrades result in lots of `context deadline exceeded` #5436

GeorgianaElena · 2025-01-28T10:44:35Z

Issue

I believe that because of the multiple helm upgrade commands being started all at once, when deploying the cloudbank hubs, lots of the helm upgrade commands fail with:

164     	Tue Jan 28 04:16:05 2025	failed    	basehub-0.1.0	1.0        	Upgrade "santiago" failed: client rate limiter Wait returned an error: context deadline exceeded
165     	Tue Jan 28 04:29:28 2025	deployed  	basehub-0.1.0	1.0        	Upgrade complete

The most concerning is that, in this particular case, helm even got fooled to believe that the deployment was successful, when in fact, there was no hub pod running what's so ever.

Context

Some relevant logs are below:

(base) ➜  pilot-hubs git:(main) ✗ kubectl get deployment -n santiago
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
proxy                    1/1     1            1           586d
shared-dirsize-metrics   1/1     1            1           497d
shared-volume-metrics    1/1     1            1           586d

(base) ➜  pilot-hubs git:(main) ✗ kubectl get pods -n santiago                                                                             
NAME                                      READY   STATUS    RESTARTS   AGE
proxy-55b66b8dff-9kzn7                    1/1     Running   0          15d
shared-dirsize-metrics-54dbc5d848-nxflj   1/1     Running   0          115d
shared-volume-metrics-fcc697459-w696q     1/1     Running   0          264d

(base) ➜  pilot-hubs git:(main) ✗ k  get events -n santiago                                        
No resources found in santiago namespace.

(base) ➜  pilot-hubs git:(main) ✗ helm get manifest santiago --namespace santiago | grep "kind: Deployment" -A5

kind: Deployment
metadata:
  name: hub
  labels:
    component: hub
    app.kubernetes.io/component: hub
--
kind: Deployment
metadata:
  name: proxy
  labels:
    component: proxy
    app.kubernetes.io/component: proxy
--
kind: Deployment
metadata:
  name: shared-dirsize-metrics
  labels:
    app: jupyterhub
    component: shared-dirsize-metrics
--
kind: Deployment
metadata:
  name: shared-volume-metrics
  labels:
    app: jupyterhub
    component: shared-volume-metrics

Mitigation

Manually redeploying the hub, brought it back up

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD] cloudbank helm upgrades result in lots of `context deadline exceeded` #5436

[CI/CD] cloudbank helm upgrades result in lots of `context deadline exceeded` #5436

GeorgianaElena commented Jan 28, 2025

[CI/CD] cloudbank helm upgrades result in lots of context deadline exceeded #5436

[CI/CD] cloudbank helm upgrades result in lots of context deadline exceeded #5436

Comments

GeorgianaElena commented Jan 28, 2025

Issue

Context

Mitigation

[CI/CD] cloudbank helm upgrades result in lots of `context deadline exceeded` #5436

[CI/CD] cloudbank helm upgrades result in lots of `context deadline exceeded` #5436