GitHub - aianshay/data-science-portfolio: Some data science projects I've been working on

Data Science Portfolio

This repository contains all the projects necessary to complete the Data Scientist Nanodegree @ Udacity.

Projects

Insights from AirBnB data from Rio de Janeiro

Using data from Inside Airbnb containing every Airbnb listing in the city of Rio de Janeiro, I did an exploratory data analysis and aimed to answer three questions of interest:

Which are the cheapest months to go?
Where to stay?
How to write the title of your first accommodation

After that, I also built a model to predict the price of an accommodation and found the most important features for pricing. Everything is synthesized in a blog post here. If you wanna check the more technical part a notebook is available in the eda-predicting-airbnb-prices folder above.

Disaster Response Pipeline

The goal of this project was to deploy a machine learning model that classifies a message in a web page. This envolved building an ETL pipeline and training a model, I used a dataset provided by Figure Eight, which contains thousands of tweets found during natural disasters. The steps involved were:

Data extraction
Data cleaning/preprocessing
Data storing in a SQLite database
Training a Random Forest classifier
Deploying the model on a webpage

Recommendations with IBM

In this project, I created different kinds of recommendations engines for the users of the IBM Watson Studio platform, it makes recommendations about new articles it thinks they will like. The dataset was provided by IBM, which contains interactions between the users and the articles. The building of such algorithms can be found in the notebook above, it is divided as the following:

Exploratory data analysis
Data cleaning/preprocessing
Rank-based Recommendations
User-based Collaborative Filtering
Content-based Recommendations
Matrix Factorization with SVD

Churn Prediction with Spark

In this project, I used the dataset of Sparkify, a fictitious music streaming service, containing every user interaction inside the app. With this data I could build a Random Forest model that classifies if a user churned or not. More interestingly, the most powerful features for predicting a churn were number of active days in the app and the number of thumbs down a user has given. I blogged about the whole proccess here.

Gaussian Distributions

I also built a Python package that implements Gaussian and Binomial distributions. The repository can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
churn-prediction-with-spark		churn-prediction-with-spark
disaster-response-pipeline		disaster-response-pipeline
eda-predicting-airbnb-prices		eda-predicting-airbnb-prices
recommedations-with-ibm		recommedations-with-ibm
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Portfolio

Projects

Insights from AirBnB data from Rio de Janeiro

Disaster Response Pipeline

Recommendations with IBM

Churn Prediction with Spark

Gaussian Distributions

About

Languages

License

aianshay/data-science-portfolio

Folders and files

Latest commit

History

Repository files navigation

Data Science Portfolio

Projects

Insights from AirBnB data from Rio de Janeiro

Disaster Response Pipeline

Recommendations with IBM

Churn Prediction with Spark

Gaussian Distributions

About

Resources

License

Stars

Watchers

Forks

Languages