Capstone Project: Spark Project

Project of Udacity Data Scientist Nanodegree Program

See here for more detailed description: https://kangle-chen1103.medium.com/predicting-churn-rates-using-pyspark-54aa757bd408

Installation

Required packeages are listed in requirement.txt.

Project Motivation

In this project, the process of a comprehensive implementation of data science knowledge in realworld project is demonstrated, which includes following steps:

Define project, analysis and modeling following the CRISP-DM process
Using Spark Dataframes and Spark ML to manipulate data and build machine learning model

File Descriptions

File Sparkify contains the pyspark scripts. File Sparkify_IBMWatson contains the pyspark scripts employed on IBM Watson.

Results and Discussion

Through data processing and feature generating, an accurate machine leanring model has been trained. The model has demonstrated that most users churned the payment after using the service for 1000 hours ~ 2000 hours. Coupon and discounts for users in this period might be effective, which certainly still requires validation from for example A/B test.

Due to the restriction of computation power, CrossValidator and paramGrid here are only to demonstrate the pipeline to employ it rather than to provide optimized trained results.

Finally, since the dataset is not large, Spark has actually not shown its advantage over python and pandas. A further task is to employ the model in aws with larger data.

Licensing, Authors, Acknowledgements

Must give credit to Udacity for the project. Otherwise, feel free to use the code here as you would like!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb
Sparkify_IBMWatson.ipynb		Sparkify_IBMWatson.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone Project: Spark Project

Table of Contents

Installation

Project Motivation

File Descriptions

Results and Discussion

Licensing, Authors, Acknowledgements

About

Releases

Packages

Languages

KangleChen/Sparkify

Folders and files

Latest commit

History

Repository files navigation

Capstone Project: Spark Project

Table of Contents

Installation

Project Motivation

File Descriptions

Results and Discussion

Licensing, Authors, Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages