Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jupyter Notebook Cell Hangs after submitting job to remote EMR cluster #154

Open
bkahloon opened this issue Mar 4, 2022 · 0 comments
Open

Comments

@bkahloon
Copy link

bkahloon commented Mar 4, 2022

What happened: Connecting to a remote EMR cluster from a Jupyter notebook (using YarnCluster for Dask Cluster creation) causes notebook cell to hang. The YarnCluster client is able to successfully submit the job to Yarn on EMR and the application is listed under the running applications tab, however on the notebook client side the cell just hangs. The application on Yarn seemingly continue to run as well and has to be manually killed (nothing in the Yarn application logs seems to be indicating an error)

What you expected to happen: After the job is submitted, the notebook cell should not hang and allow user to submit further Dask transformation code to the Dask cluster created on EMR (Yarn app)

Minimal Complete Verifiable Example:

Hangs after submitting the following code in the notebook cell, no errors are reported (and there is a little asterisk beside the cell)

from dask_yarn import YarnCluster

cluster = YarnCluster.from_specification( 'spec.yaml')

client = Client(cluster)

spec.yaml

  • please note that Dask-yarn is installed on all EMR nodes already
name: test-dask
queue: default

services:
  dask.scheduler:
    # Restrict scheduler to 2 GiB and 1 core
    resources:
      memory: 2 GiB
      vcores: 1
    script: |
      dask-yarn services scheduler
  dask.worker:
    # Don't start any workers initially
    instances: 0
    # Workers can infinite number of times
    max_restarts: -1
    depends:
      - dask.scheduler
    # Restrict workers to 4 GiB and 2 cores each
    resources:
      memory: 4 GiB
      vcores: 2
    # Distribute this python environment to every worker node
    files:
      environment: /notebooks_deps_pkg.tar.gz
    # The bash script to start the worker
    # Here we activate the environment, then start the worker
    script: |
      virtualenv env
      source env/bin/activate
      dask-yarn services worker

Anything else we need to know?: In the logs after adding print statement to base skein core.py file (added a print(req) before the return) I see the following in the logs

22/03/04 21:08:19 INFO conf.Configuration: resource-types.xml not found
22/03/04 21:08:19 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/03/04 21:08:19 INFO skein.Driver: Uploading application resources to hdfs://cluster.ip:8020/user/hadoop/.skein/application_1646182918041_0074
22/03/04 21:08:43 INFO skein.Driver: Submitting application...
22/03/04 21:08:43 INFO impl.YarnClientImpl: Submitted application application_1646182918041_0074
id: "application_1646182918041_0074"

<generator object KeyValueStore._input_iter at 0x7f20908370a0>

Then it just hangs in the notebook cell

Environment:

  • Dask version: 0.8.1
  • Python version: 3.6.9
  • Operating System: Ubuntu
  • Install method (conda, pip, source): Pip
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant