Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI/CD] cloudbank helm upgrades result in lots of context deadline exceeded #5436

Open
GeorgianaElena opened this issue Jan 28, 2025 · 0 comments

Comments

@GeorgianaElena
Copy link
Member

Issue

I believe that because of the multiple helm upgrade commands being started all at once, when deploying the cloudbank hubs, lots of the helm upgrade commands fail with:

164     	Tue Jan 28 04:16:05 2025	failed    	basehub-0.1.0	1.0        	Upgrade "santiago" failed: client rate limiter Wait returned an error: context deadline exceeded
165     	Tue Jan 28 04:29:28 2025	deployed  	basehub-0.1.0	1.0        	Upgrade complete        

The most concerning is that, in this particular case, helm even got fooled to believe that the deployment was successful, when in fact, there was no hub pod running what's so ever.

Context

Some relevant logs are below:

(base) ➜  pilot-hubs git:(main) ✗ kubectl get deployment -n santiago
NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
proxy                    1/1     1            1           586d
shared-dirsize-metrics   1/1     1            1           497d
shared-volume-metrics    1/1     1            1           586d
(base) ➜  pilot-hubs git:(main) ✗ kubectl get pods -n santiago                                                                             
NAME                                      READY   STATUS    RESTARTS   AGE
proxy-55b66b8dff-9kzn7                    1/1     Running   0          15d
shared-dirsize-metrics-54dbc5d848-nxflj   1/1     Running   0          115d
shared-volume-metrics-fcc697459-w696q     1/1     Running   0          264d
(base) ➜  pilot-hubs git:(main) ✗ k  get events -n santiago                                        
No resources found in santiago namespace.
(base) ➜  pilot-hubs git:(main) ✗ helm get manifest santiago --namespace santiago | grep "kind: Deployment" -A5

kind: Deployment
metadata:
  name: hub
  labels:
    component: hub
    app.kubernetes.io/component: hub
--
kind: Deployment
metadata:
  name: proxy
  labels:
    component: proxy
    app.kubernetes.io/component: proxy
--
kind: Deployment
metadata:
  name: shared-dirsize-metrics
  labels:
    app: jupyterhub
    component: shared-dirsize-metrics
--
kind: Deployment
metadata:
  name: shared-volume-metrics
  labels:
    app: jupyterhub
    component: shared-volume-metrics

Mitigation

Manually redeploying the hub, brought it back up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant