Two Python notebooks in which Stochastic Gradient Descent and K-means algorithms are run over the Spark framework. They were, respectively, the first and the second laboratory sessions for the "Clouds" course we have followed at Eurecom.
The goal of this demo is to present the Gradient Descent algorithm and some of its variants (Stochastic GD and Mini-batch SGD). Algorithms are analysed while running on a single machine and distributed over the cluster.
The clustering algorithm is presented and run over a dataset (also in a distributed fashion). The K-means++ variant, useful to determine the initial cluster centroids in a smart way, is analyzed too.
You can find the Python Notebooks exported as HTML files, which are more portable in terms of readability.
We want to thank you our teacher, Michiardi Pietro, who has realized the baseline for the notebooks and has guided us during their realization, teaching us all the techniques presented here.
ANGIUS Marco and AVALLE Giorgio - Ⓒ2017