Skip to content

theiotidiot/nd027-c3-data-lakes-with-spark

 
 

Repository files navigation

Purpose of This Repo

This repo contains the exercises for the two lessons from the Data Lakes with Spark course in ND027 - Data Engineering Nanodegree program:

  • Lesson 3: Setting up Spark Clusters using AWS, and
  • Lesson 4: Debugging and Optimization.

Folder Structure

Lesson Folder

This repo contains a folder for each lesson

Lesson 3:  Submitting_spark_scripts
Lesson 4:  Write_to_s3

Demo folder

Lesson 3 includes a demo code folder containing code and data files used from the classroom.

Exercises Folder

Each lesson folder contains an exercises folder. This exercises folder should contain all files and instructions necessary for the exercises along with the solution. See the README in the exercises folder for information about folder structure.

Lesson 3: Setting Spark Cluster in AWS

Folder: Submitting_spark_scripts Folder: Write_to_s3

Lesson 4: Debugging and Optimization

Folder: Exercise

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 59.0%
  • Jupyter Notebook 30.3%
  • Shell 10.7%