Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMR bootstrap script fails #122

Open
ResidentMario opened this issue Jul 22, 2020 · 18 comments
Open

EMR bootstrap script fails #122

ResidentMario opened this issue Jul 22, 2020 · 18 comments

Comments

@ResidentMario
Copy link

The EMR bootstrap script currently fails with the following error (found via stderr logs):

+ sudo mv /tmp/jupyter-notebook.conf /etc/init/
mv: cannot create regular file ‘/etc/init/’: Not a directory
@mrocklin
Copy link
Member

mrocklin commented Aug 4, 2020

Thank you for the error report @ResidentMario . My apologies in the delayed response. The folks who maintain this repository have been busy lately.

Do you have any interest in submitting a patch to resolve this issue?

@ResidentMario
Copy link
Author

I might be able to look into it, but no promises.

@nmerket
Copy link

nmerket commented Aug 5, 2020

I am trying to debug some other issues with this and found that by using the EMR release emr-5.29.0 instead of emr-5.30.1 resolves the problem. It looks like something in the new image is causing the problem. Thought that bit of intel might help.

@hegde-anish
Copy link

Apparently emr-5.30 onwards they only support systemd and no longer support upstart.

@datafuz
Copy link

datafuz commented Oct 1, 2020

I think it has to do with Amazon Linux 2:

Amazon Linux 2 support – In EMR version 5.30.0 and later, EMR uses Amazon Linux 2 OS. New custom AMIs (Amazon Machine Image) must be based on the Amazon Linux 2 AMI. For more information, see Using a Custom AMI.

@hamzahiqb
Copy link

I tried to make it work with systemd and updated it with the following:

# -----------------------------------------------------------------------------
# 10. Configure Jupyter Notebook
# -----------------------------------------------------------------------------
echo "Configuring Jupyter"
mkdir -p $HOME/.jupyter
HASHED_PASSWORD=`python -c "from notebook.auth import passwd; print(passwd('$JUPYTER_PASSWORD'))"`
cat <<EOF >> $HOME/.jupyter/jupyter_notebook_config.py
c.NotebookApp.password = u'$HASHED_PASSWORD'
c.NotebookApp.open_browser = False
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.port = 8889
EOF

# # -----------------------------------------------------------------------------
# # 11. Define an upstart service for the Jupyter Notebook Server
# #
# # This sets the notebook server up to properly run as a background service.
# # -----------------------------------------------------------------------------
echo "Configuring Jupyter Notebook Upstart Service"
cat <<EOF > /tmp/jupyter-notebook.service
[Unit]
Description=Jupyter Notebook

[Service]
ExecStart=$HOME/miniconda/bin/jupyter-notebook --config=$HOME/.jupyter/jupyter_notebook_config.py
Type=simple
PIDFile=/run/jupyter.pid
WorkingDirectory=$HOME
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF
sudo mv /tmp/jupyter-notebook.service /etc/init.d/
sudo systemctl enable /etc/init.d/jupyter-notebook.service

# # -----------------------------------------------------------------------------
# # 12. Start the Jupyter Notebook Server
# # -----------------------------------------------------------------------------
# echo "Starting Jupyter Notebook Server"

sudo systemctl daemon-reload
sudo systemctl restart jupyter-notebook.service

Note: I added a port for the notebook.

This runs on bootstrap but there is nothing on port 8889. When I run the ExecStart command manually, via ssh, the notebook opens. So not sure what I'm doing wrong. I also get the following problem: #124

Sources for the new script:

  1. https://gist.github.com/klingtnet/76c542613e544a13bb7ad741b53f1f73
  2. https://medium.com/@joelzhang/setting-up-jupyter-notebook-server-as-service-in-ubuntu-16-04-116cf8e84781

EMR version 5.31.0
Hadoop distribution:Amazon 2.10.0
Python: 3.7.9

@hegde-anish
Copy link

Hi @hiqbal2, Your script for systemd was super helpful. I got it to work by doing a few changes to this script.

  1. ExecStart=$HOME/miniconda/bin/jupyter-notebook --allow-root --config=$HOME/.jupyter/jupyter_notebook_config.py
  2. sudo mv /tmp/jupyter-notebook.service /etc/systemd/system/
  3. sudo systemctl enable jupyter-notebook.service

I hope this helps

@hamzahiqb
Copy link

@hegde-anish thanks for the help. EMR seems to bootstrap properly now. However, not sure if you got this error when trying to start a dask cluster:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-762253d83df2> in <module>
      1 # Create a cluster
----> 2 cluster = YarnCluster()
      3 
      4 # Connect to the cluster
      5 client = Client(cluster)

/home/hadoop/miniconda/lib/python3.8/site-packages/dask_yarn/core.py in __init__(self, environment, n_workers, worker_vcores, worker_memory, worker_restarts, worker_env, scheduler_vcores, scheduler_memory, deploy_mode, name, queue, tags, user, host, port, dashboard_address, skein_client, asynchronous, loop)
    366         loop=None,
    367     ):
--> 368         spec = _make_specification(
    369             environment=environment,
    370             n_workers=n_workers,

/home/hadoop/miniconda/lib/python3.8/site-packages/dask_yarn/core.py in _make_specification(**kwargs)
    184             "See http://yarn.dask.org/environments.html for more information."
    185         )
--> 186         raise ValueError(msg)
    187 
    188     n_workers = lookup(kwargs, "n_workers", "yarn.worker.count")

ValueError: You must provide a path to a Python environment for the workers.
This may be one of the following:
- A conda environment archived with conda-pack
- A virtual environment archived with venv-pack
- A path to a conda environment, specified as conda://...
- A path to a virtual environment, specified as venv://...
- A path to a python binary to use, specified as python://...

See http://yarn.dask.org/environments.html for more information.

Not sure why this is happening.

I am also not sure if there is a difference in behaviour in just calling $HOME/miniconda/bin/jupyter-notebook vs the original script: exec su - hadoop -c "jupyter notebook". When I try the old command i get the error that hadoop -c does not exists.

I don't have any experience with hadoop or dask, so am a little lost on debugging this.

@kqshan
Copy link

kqshan commented Dec 18, 2020

This modified bootstrap script worked for me, with a few additional fixes:

  • conda pack failed with python=3.8.5 (see AWS EMR bootstrap script fails #133), so I specified a 3.7 version
  • My conda environment already contained tornado 6.1, which I found worked with jupyter-server-proxy 1.5.2 without issue (despite the comment in the script saying otherwise)
  • The AMI I used (EMR 5.32) contains aliases for python -> /usr/bin/python3 and pip -> /usr/bin/pip3 in /etc/bashrc (which gets imported into $HOME/.bashrc). This interferes with conda, since we want python -> ~/miniconda/bin/python
  • I also ran into the ValueError: You must provide a path to a Python environment for the workers issue that @hiqbal2 encountered. The root cause (no pun intended) is that the notebook server is running as root instead of the hadoop user.

To fix the latter two issues, I added unalias commands to ~/.bashrc before sourceing it, which feels like a bit of a hack:

# -----------------------------------------------------------------------------
# 2. Install Miniconda
# -----------------------------------------------------------------------------
echo "Installing Miniconda"
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o /tmp/miniconda.sh
bash /tmp/miniconda.sh -b -p $HOME/miniconda
rm /tmp/miniconda.sh
echo -e 'unalias python || true' >> $HOME/.bashrc
echo -e 'unalias pip || true' >> $HOME/.bashrc
echo -e '\nexport PATH=$HOME/miniconda/bin:$PATH' >> $HOME/.bashrc
source $HOME/.bashrc
conda update conda -y

and I specified a User in the systemd [Service] section (which also let me remove the --allow-root flag that @hegde-anish suggested). I also had to export the JAVA_HOME environment variable:

# -----------------------------------------------------------------------------
# 11. Define an upstart service for the Jupyter Notebook Server
#
# This sets the notebook server up to properly run as a background service.
# -----------------------------------------------------------------------------
echo "Configuring Jupyter Notebook Upstart Service"
cat <<EOF > /tmp/jupyter-notebook.service
[Unit]
Description=Jupyter Notebook

[Service]
User=hadoop
ExecStart=$HOME/miniconda/bin/jupyter-notebook --config=$HOME/.jupyter/jupyter_notebook_config.py
Environment=JAVA_HOME=$JAVA_HOME
Type=simple
PIDFile=/run/jupyter.pid
WorkingDirectory=$HOME
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF
sudo mv /tmp/jupyter-notebook.service /etc/systemd/system/
sudo systemctl enable jupyter-notebook


# -----------------------------------------------------------------------------
# 12. Start the Jupyter Notebook Server
# -----------------------------------------------------------------------------
echo "Starting Jupyter Notebook Server"
sudo systemctl daemon-reload
sudo systemctl start jupyter-notebook

EMR version 5.32.0
Hadoop distribution: Amazon 2.10.1
Python 3.7.6

@hamzahiqb
Copy link

The above worked for me. However, the jupyter notebook now just does not output any values. I tried to start the notebook via ssh and got the following error when trying to do a simple 2+2:

[E 12:12:00.355 NotebookApp] Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7fc061beb4d0>, <Future finished exception=TimeoutError('Timeout')>)
    Traceback (most recent call last):
      File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/ioloop.py", line 758, in _run_callback
        ret = callback()
      File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
        return fn(*args, **kwargs)
      File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/websocket.py", line 553, in <lambda>
        self.stream.io_loop.add_future(result, lambda f: f.result())
    tornado.util.TimeoutError: Timeout
ERROR:asyncio:Future exception was never retrieved
future: <Future finished exception=TimeoutError('Timeout')>
Traceback (most recent call last):
  File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run
    yielded = self.gen.throw(*exc_info)
  File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/websocket.py", line 757, in _accept_connection
    yield open_result
  File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run
    value = future.result()
  File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/hadoop/miniconda/lib/python3.7/site-packages/tornado/websocket.py", line 553, in <lambda>
    self.stream.io_loop.add_future(result, lambda f: f.result())
tornado.util.TimeoutError: Timeout

@davegravy
Copy link

@kqshan this is great, thanks.

I didn't find I needed to to unalias, after the bootstrap I had proper pointers to miniconda python/pip. I'm running a newer EMR (emr-6.2.0) so this may be a factor.

I removed the version pin for tornado as well.

The conda pack issue appears to be from this conda issue. I added --ignore-missing-files and it resolved although I don't know if I'll hit environment synchronization issues with my workers as a result (haven't gotten that far in testing yet)

Also the version spec for dask-yarn causes a file to be written to the home folder called ''=0.7.0". Some escaping or quoting likely necessary to fix but I just removed the version specification because conda installed 0.8.1 on its own.

@tjburrows
Copy link

I also ran into this issue when trying the bootstrap script. @davegravy did you test your script? Do you have a version that works with emr-6.2.0?

@quasiben
Copy link
Member

quasiben commented Mar 9, 2021

What version of conda-pack is used ? I believe 0.6 was released a month ago

@davegravy
Copy link

I also ran into this issue when trying the bootstrap script. @davegravy did you test your script? Do you have a version that works with emr-6.2.0?

My script works with emr-6.2.0, yes. I've had no issues with dask & EMR, at least not that I can attribute to the bootstrap.

@tjburrows
Copy link

I also ran into this issue when trying the bootstrap script. @davegravy did you test your script? Do you have a version that works with emr-6.2.0?

My script works with emr-6.2.0, yes. I've had no issues with dask & EMR, at least not that I can attribute to the bootstrap.

Can you share it?

@davegravy
Copy link

I also ran into this issue when trying the bootstrap script. @davegravy did you test your script? Do you have a version that works with emr-6.2.0?

My script works with emr-6.2.0, yes. I've had no issues with dask & EMR, at least not that I can attribute to the bootstrap.

Can you share it?

Sure:

https://gist.github.com/davegravy/61e3abb81176f4490032554b70d28c31

@gabriel131188
Copy link

Hello, i tried install dask with many versions of bootstrap and EMR versions but anything doesnt work. If it's possible share with me what EMR version and dask bootstrap you used. Thanks
@davegravy in your bootstrap the line 125 is censured "Downloading pyquis step".

@davegravy
Copy link

Hello, i tried install dask with many versions of bootstrap and EMR versions but anything doesnt work. If it's possible share with me what EMR version and dask bootstrap you used. Thanks

Hi I was using EMR 6.2.0.

@davegravy in your bootstrap the line 125 is censured "Downloading pyquis step".

This is a private python library my bootstrap script installs. It shouldn't have any bearing on the bootstrap's ability to succeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests