Spark-for-Big-Data

This repository demonstrates how to use Spark to work with big data and build machine learning models at scale.

Goals

Practice processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs (Spark SQL, PySpark).
Debug and optimize for data skewness when running on a cluster.
Use Spark’s Machine Learning Library (MLlib) to train machine learning models at scale.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data Wrangling with Spark.ipynb		Data Wrangling with Spark.ipynb
Machine Learning with Spark.ipynb		Machine Learning with Spark.ipynb
Optimizing for Data Skewness with Spark.ipynb		Optimizing for Data Skewness with Spark.ipynb
README.md		README.md

Provide feedback