Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submit (long running) Yarn Cluster via CLI w/out Client Script #90

Open
bschreck opened this issue Aug 1, 2019 · 3 comments
Open

Submit (long running) Yarn Cluster via CLI w/out Client Script #90

bschreck opened this issue Aug 1, 2019 · 3 comments

Comments

@bschreck
Copy link

bschreck commented Aug 1, 2019

Is there a way to use the CLI tools to submit a new YARN application and cluster, without creating a dask client script?

I know I can just add a while loop to the client script to run indefinitely, but it seems there should be a way to do something like:

dask-yarn start_cluster

I tried to just use dask-yarn services scheduler/worker but it complains about now being inside a YARN container:

Traceback (most recent call last):
  File "/home/hadoop/miniconda/bin/dask-yarn", line 11, in <module>
    sys.exit(main())
  File "/home/hadoop/miniconda/lib/python3.6/site-packages/dask_yarn/cli.py", line 412, in main
    func(**kwargs)
  File "/home/hadoop/miniconda/lib/python3.6/site-packages/dask_yarn/cli.py", line 289, in scheduler
    app_client = skein.ApplicationClient.from_current()
  File "/home/hadoop/miniconda/lib/python3.6/site-packages/skein/core.py", line 1046, in from_current
    raise context.ValueError("Not running inside a container")
ValueError: Not running inside a container
@jcrist
Copy link
Member

jcrist commented Aug 1, 2019

No, this feature is not currently implemented. It could be pretty easily though. Right now things are designed to either be run as a batch job (via the CLI), or programmatically (notebook/console/script) with the lifetime of the cluster bound to the lifetime of the client process.

Can you discuss your use case a bit more?

@bschreck
Copy link
Author

bschreck commented Aug 1, 2019

We would like to use a notebook that's not actually running on the master node, but within the same VPC, which would connect to an already running dask cluster using Client(). We have notebooks on other services (e.g. sagemaker) that would like access to distributed compute for ad hoc experimentation

@jennakwon06
Copy link

I have the same use case as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants