A look into English-speaking movies, award winner and nominees from 1934 to 2017 using data from IMDb, Kaggle and self-curation.
An exercise with a very fruitful and teaching preprocessing part. The multi-million row IMDb datasets, filtered and enhanced with webcrawling will probably inspire more analysis later on.
Using audio data from Spotify and lyrical sentiment analysis using tidyText to compare Taylor Swift's latest album to her previous work
First exercise without a ready-to-go dataset. First time working with APIs and pipes (%>%). Further practice with ggplot2.
A learning practice in visualization using ggplot2, and exploratory data analysis.
A regression problem solved using XGBoost & 10-fold cross validation.
My submission to McKinsey's 24-hour online hackathon. A probabilistic classification problem solved using conditional random forest. Earned an AUCROC score of 0.847, while the winning submission got 0.860.
A "Hello World" project to data science. A classification problem with three seperate solutions using logistic regression, decision tree and random forest.