bump dask gateway image version #503

TomAugspurger · 2020-01-23T18:46:11Z

On staging.hub.pangeo.io, I think we have a mismatch between the dask-gateway versions. I think the server is at 0.6.1, but the ClusterManager is pulling an older version. This may be causes the errors I'm seeing

dask-gateway-scheduler: error: unrecognized arguments: --adaptive-period 3.0 --idle-timeout 0.0

on create_cluster().

We'll need a better strategy for pinning the version of dask-gateway in the worker / scheduler images via pangeo-stacks to the version used on the server (from helm I think). Not sure what the best strategy is yet.

jhamman · 2020-01-23T19:10:13Z

Thanks @TomAugspurger. Feel free to merge these incremental changes to staging yourself (do you have merge rights?).

We'll need a better strategy for pinning the version of dask-gateway in the worker / scheduler images via pangeo-stacks to the version used on the server (from helm I think). Not sure what the best strategy is yet.

Absolutely. We really need to automate this. There are many places in our stack that have similar problems (e.g. pangeo-data/pangeo-binder#70).

TomAugspurger · 2020-01-23T19:28:19Z

@jhamman I do not have write privileges. Would you mind granting them?

jhamman · 2020-01-23T19:31:47Z

@jhamman I do not have write privileges. Would you mind granting them?

Done.

TomAugspurger · 2020-01-23T19:46:58Z

Thanks!

TomAugspurger · 2020-01-23T20:07:07Z

Hmm, it seems that helm didn't redeploy dask-gateway.

$ kubectl -n dev-staging get pod | grep gateway
gateway-dev-staging-dask-gateway-7dcc5dbfff-4g886           1/1     Running   0          23h
scheduler-proxy-dev-staging-dask-gateway-5c75df8745-72f6d   1/1     Running   0          23h
web-proxy-dev-staging-dask-gateway-6f6dfdbb5f-t68ch         1/1     Running   0          23h

I've manually deleted the dask-gateway pods (kubectl -n dev-staging delete pod -l 'app.kubernetes.io/name=dask-gateway') and kubernetes restarted them.

TomAugspurger · 2020-01-23T20:09:03Z

Darn, we're still getting the old image in the scheduler pod created by Gateway.new_cluster().

spec:
  automountServiceAccountToken: false
  containers:
  - args:
    - dask-gateway-scheduler
    - --adaptive-period
    - "3.0"
    - --idle-timeout
    - "0.0"
    env:
    - name: DASK_GATEWAY_API_URL
      ...
    image: pangeo/base-notebook:2019.09.23

Possibly an old config? Still debugging.

TomAugspurger · 2020-01-23T20:39:41Z

Hmm, it seems that helm didn't redeploy dask-gateway.

For posterity, this may not have been helm or dask-gateway's fault. We weren't nesting the dask-gateway configuration correctly: #505

bump

5d97413

TomAugspurger merged commit e9a1742 into pangeo-data:staging Jan 23, 2020

TomAugspurger deleted the bump-dask-image branch January 23, 2020 19:47

jhamman mentioned this pull request Feb 4, 2020

staging -> prod (long list of PRs) #524

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bump dask gateway image version #503

bump dask gateway image version #503

TomAugspurger commented Jan 23, 2020

jhamman commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020

jhamman commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020 •

edited

Loading

TomAugspurger commented Jan 23, 2020

bump dask gateway image version #503

bump dask gateway image version #503

Conversation

TomAugspurger commented Jan 23, 2020

jhamman commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020

jhamman commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020 • edited Loading

TomAugspurger commented Jan 23, 2020

TomAugspurger commented Jan 23, 2020 •

edited

Loading