Updated by Laura Edell, Sr. Data Scientist | Microsoft MSUS CTO CSU Organization
Date: 10/15/2018
Use the challenges in this repo to get started using Spark in Azure Databricks.
0a. Start by following the Setup Guide to prepare your Azure environment. (Administrators or others facilitating workshop at a later time can use the https://aka.ms/administrationForDatabricksWorkshops.dbc to set up Azure Databricks clusters for attendees.
0b. Download the Challenge Files Not for student of Ready from this repo or fork this repository to your own. After you successfully complete both steps listed as 0a and 0b, please complete the challenges in the following in order:
-
Challenge 1 - Getting Started with Spark In this challenge, you'll learn how to provision a Spark cluster in an Azure Databricks workspace, followed by interacting with the data using Python or Scala.
-
Challenge 2 - Running a Spark Job. In this challenge, you'll learn how to configure a Spark job for silent execution allowing you to schedule your batch processing workloads.
-
Challenge 3 - Using Structured Streaming. In this challenge, you'll learn how to use Spark to process stream(s) of real-time data using IoT sensor data.
-
Challenge 4 - Introduction to Machine Learning. In this challenge, you'll be introduced to using Spark to train & evaluate a classification model.
In the Advanced Databricks workshop, you will learn more about MMLSpark and how to build several types of Supervised and Unsupervised Machine Learning models for different business use cases: https://github.com/annedroid/Ready2019_AA_AI319