This is a repository of the projects I worked on or currently working on. It is updated regularly. The projects are either written in R (R markdown) or Python (Jupyter Notebook). The goal of the projects is to use data science/statistical modelling techniques to find something that is interesting. A typical project consist of finding and cleaning data, analysis, visualization and conclusion. Click on the projects to see full analysis and code.
- Cross correlation analysis between Bitcoin Price and S&P500 price over time.
- Granger causality test between Bitcoin and stock prices
- Fitted ARIMA model on Bitcoin prices to forecast Bitcoin range of movement.
- Keywords(R, Time Series, Causality, Quandl API)
- Predicted US (2016) election victories as the voting results of each region becomes available.
- Regressed states with results against polling data and predicted results for the remaining states
- Monte Carlos simulation used to simulate the winner of the election.
- Compared simulated results with exchange rates fluctuations to see if market is efficient.
- Keywords(Python, Linear Regression, Monte Carlos Simulation)
- Fitted power-law and log-normal distribution to US baby names data since 1960.
- Use bootstrapping techniques to find a distribution of the power-law parameters
- Crawled Twitter to find 20000 random user and fitted power law distribution to users' friends count and followers count.
- Keywords(R, Power-law, Bootstrapping, Log-normal)
- Plotted scatter-plot matrix to visualize the data
- Fitted polynomial linear regression on wine quality vs wine chemical properties.
- Used ridge and lasso regularization to tackle overfitting and compared result
- Used cross validation to select the optimal regularization strength
- Keywords(Python, Linear Regression, Ridge and Lasso Regularization, Cross Validation)
- Parsed a few GB of Tweets to select all the tweets in UK and in English.
- Used 'qdap' package to analyze the emotion of the Tweets
- Plotted the emotions over the day and over the week and analysed the interesting results.
- Keywords(R, Twitter API, Time Series, Sentiment Analysis, ggplot)
- Downloaded economic indicators data using World Bank API, and cleaned data
- Downloaded search query of next and last year in Google for each country
- Fitted linear regression between GDP and future orientation
- Keywords(R, World Bank API, Google API, Data Cleaning, Linear regression)
- Predicted UK (2017) election victories as the voting results as it happened.
- retrieved from Tweets of result announcement and extracted time of announcement for each region.
- Regressed regions with results against polling data and predicted results for the remaining regions
- Monte Carlos simulation used to simulate the winner of the election.
- Keywords(Python, Twitter API, Merging Data)