Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump dask gateway image version #503

Merged
merged 1 commit into from
Jan 23, 2020

Conversation

TomAugspurger
Copy link
Member

On staging.hub.pangeo.io, I think we have a mismatch between the dask-gateway versions. I think the server is at 0.6.1, but the ClusterManager is pulling an older version. This may be causes the errors I'm seeing

dask-gateway-scheduler: error: unrecognized arguments: --adaptive-period 3.0 --idle-timeout 0.0

on create_cluster().

We'll need a better strategy for pinning the version of dask-gateway in the worker / scheduler images via pangeo-stacks to the version used on the server (from helm I think). Not sure what the best strategy is yet.

@jhamman
Copy link
Member

jhamman commented Jan 23, 2020

Thanks @TomAugspurger. Feel free to merge these incremental changes to staging yourself (do you have merge rights?).

We'll need a better strategy for pinning the version of dask-gateway in the worker / scheduler images via pangeo-stacks to the version used on the server (from helm I think). Not sure what the best strategy is yet.

Absolutely. We really need to automate this. There are many places in our stack that have similar problems (e.g. pangeo-data/pangeo-binder#70).

@TomAugspurger
Copy link
Member Author

@jhamman I do not have write privileges. Would you mind granting them?

@jhamman
Copy link
Member

jhamman commented Jan 23, 2020

@jhamman I do not have write privileges. Would you mind granting them?

Done.

@TomAugspurger TomAugspurger merged commit e9a1742 into pangeo-data:staging Jan 23, 2020
@TomAugspurger
Copy link
Member Author

Thanks!

@TomAugspurger TomAugspurger deleted the bump-dask-image branch January 23, 2020 19:47
@TomAugspurger
Copy link
Member Author

Hmm, it seems that helm didn't redeploy dask-gateway.

$ kubectl -n dev-staging get pod | grep gateway
gateway-dev-staging-dask-gateway-7dcc5dbfff-4g886           1/1     Running   0          23h
scheduler-proxy-dev-staging-dask-gateway-5c75df8745-72f6d   1/1     Running   0          23h
web-proxy-dev-staging-dask-gateway-6f6dfdbb5f-t68ch         1/1     Running   0          23h

I've manually deleted the dask-gateway pods (kubectl -n dev-staging delete pod -l 'app.kubernetes.io/name=dask-gateway') and kubernetes restarted them.

@TomAugspurger
Copy link
Member Author

TomAugspurger commented Jan 23, 2020

Darn, we're still getting the old image in the scheduler pod created by Gateway.new_cluster().

spec:
  automountServiceAccountToken: false
  containers:
  - args:
    - dask-gateway-scheduler
    - --adaptive-period
    - "3.0"
    - --idle-timeout
    - "0.0"
    env:
    - name: DASK_GATEWAY_API_URL
      ...
    image: pangeo/base-notebook:2019.09.23

Possibly an old config? Still debugging.

@TomAugspurger
Copy link
Member Author

Hmm, it seems that helm didn't redeploy dask-gateway.

For posterity, this may not have been helm or dask-gateway's fault. We weren't nesting the dask-gateway configuration correctly: #505

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants