Skip to content

Latest commit

 

History

History
72 lines (53 loc) · 4.8 KB

README.md

File metadata and controls

72 lines (53 loc) · 4.8 KB

Data Science Portfolio

This is a repository of the projects I worked on or currently working on. It is updated regularly. The projects are either written in R (R markdown) or Python (Jupyter Notebook). The goal of the projects is to use data science/statistical modelling techniques to find something that is interesting. A typical project consist of finding and cleaning data, analysis, visualization and conclusion. Click on the projects to see full analysis and code.

Projects:

  • Cross correlation analysis between Bitcoin Price and S&P500 price over time.
  • Granger causality test between Bitcoin and stock prices
  • Fitted ARIMA model on Bitcoin prices to forecast Bitcoin range of movement.
  • Keywords(R, Time Series, Causality, Quandl API)


  • Predicted US (2016) election victories as the voting results of each region becomes available.
  • Regressed states with results against polling data and predicted results for the remaining states
  • Monte Carlos simulation used to simulate the winner of the election.
  • Compared simulated results with exchange rates fluctuations to see if market is efficient.
  • Keywords(Python, Linear Regression, Monte Carlos Simulation)


  • Fitted power-law and log-normal distribution to US baby names data since 1960.
  • Use bootstrapping techniques to find a distribution of the power-law parameters
  • Crawled Twitter to find 20000 random user and fitted power law distribution to users' friends count and followers count.
  • Keywords(R, Power-law, Bootstrapping, Log-normal)


  • Plotted scatter-plot matrix to visualize the data
  • Fitted polynomial linear regression on wine quality vs wine chemical properties.
  • Used ridge and lasso regularization to tackle overfitting and compared result
  • Used cross validation to select the optimal regularization strength
  • Keywords(Python, Linear Regression, Ridge and Lasso Regularization, Cross Validation)


  • Parsed a few GB of Tweets to select all the tweets in UK and in English.
  • Used 'qdap' package to analyze the emotion of the Tweets
  • Plotted the emotions over the day and over the week and analysed the interesting results.
  • Keywords(R, Twitter API, Time Series, Sentiment Analysis, ggplot)

  • Downloaded economic indicators data using World Bank API, and cleaned data
  • Downloaded search query of next and last year in Google for each country
  • Fitted linear regression between GDP and future orientation
  • Keywords(R, World Bank API, Google API, Data Cleaning, Linear regression)

  • Predicted UK (2017) election victories as the voting results as it happened.
  • retrieved from Tweets of result announcement and extracted time of announcement for each region.
  • Regressed regions with results against polling data and predicted results for the remaining regions
  • Monte Carlos simulation used to simulate the winner of the election.
  • Keywords(Python, Twitter API, Merging Data)