Loaded with:
- Jupyter Hub
- Anaconda 3
- RStudio Server with a lot of commonly used packages
- dbt
- Apache Airflow worker
- git
- Clone this repo
- Install docker and docker-compose
- Copy env_example to .env and edit
- Edit volume section of docker-compose.yml
- The path before ":" is an existing directory that will be visible to Jupyter and RStudio.
- Run
docker-compose up -d
- If running locally, point your browser to
- Jupyter Hub http://localhost:8000
- RStudio Server http://localhost:8787
- Port 8793 is reserved for Airflow worker.
Before using git, set up git configuration by running:
$HOME/bin/setup_git.sh "Your Name" "your email"
You should use https protocol to pull and push to remote repository.
- Edit volume section of docker-composer.yml to mount host's directory to :/home (Instead of /home/ds for example)
- Shell into the container as root by `docker exec -it <container_id> /bin/bash
- Add user by
mkdir -p /home/<user_name>
useradd <user_name> -d /home/<user_name>
chown -r <user_name>: /home/<user_name>
echo <user_name>:<initial_password> | chpasswd
- The user can change their password with
passwd
command on the terminal on Jupyter Hub or RStudio
As I write this, Container Optimized OS does not have docker-compose. Unless you do some work-around, you cannot use the docker-compose command I showed above to start the service. Also note that Containers on Compute Engine is beta and it does not support port forwarding or volume mapping.
I recommend following the example steps below.
Example setting:
- Choose 2 vCPUs 7.5GB memory
- Do NOT check "Deploy a container image to this VM instance"
- Choose Boot disk to one of Container Optimized OS images
- Set boot disk size: 20GB
- Allow HTTP & HTTPS traffic
- Disk: Add 50GB. Make it persisting.
- Network tags: datasci-box
You can do the following with the browser-based terminal GCP provides from the VM instances.
sudo mkdir /mnt/stateful_partition/home2
sudo docker run -d -v /mnt/stateful_partition/home2:/home \
-p 8000:8000 -p 8787:8787 -p 8793:8793 \
-e USER_NAME=<some_username> -e USER_PASSWORD=<strong_password> \
--name datasci anelen/datasci
While you are here, note your user name:
(Do this outside the container)
echo $USER
You probably don't want to expose the services externally, so I will show how to connect to the services using ssh tunneling.
If you have not, generate a project-wide ssh key:
ssh-keygen -t rsa -f ~/.ssh/[KEY_FILENAME] -C [USERNAME]
Note: USERNAME is not your email address. You have it if you followed the last instruction of the previous section.
Add the content of the public key (~/.ssh/[KEY_FILENAME].pub) as project-wide meta key
On your local computer, start the port fowarding by
ssh -L 8000:localhost:8000 -L 8787:localhost:8787 [USERNAME]@<instance_external_ip_address>
Then point your browser to
- Jupyter Hub http://localhost:8000
- RStudio Server http://localhost:8787
If you persisted the boot disk or mounted a persisting disk, you can restart the instance and recover the previous state. (If you didn't delete the VM instance, of course.)
After restarting the VM instance, run the ssh command as in the previous section, then
sudo docker start datasci
See this instruction