This folder contains a docker container with PySpark ready to be run from a Jupyter Notebook, specifically customized for the course.
For more general uses, we recommend to use the official Jupyter Docker Stacks. This image itself is derived from jupyter/pyspark-notebook
one.
To run it, simply do:
docker run -p 8888:8888 -ti luisbelloch/pyspark-jupyter
And navigate to http://localhost:8888. The password token will be displayed in the terminal.
This image contains data
folder used in the examples. You can easily access to it from the notebook:
rdd = sc.textFile('./data/compras_tiny.csv')
rdd.take(2)