This is a selection of books and articles that I found useful. The contents might be related to statistics, data science, business, programming, or just analytical thinking in general. I will try to put summary or a brief background on each.
This list will be constantly updated and/or restructured as I find more reading materials.
- Thinking with data. A very good book to get started on how to properly analyze data, from problem framing to the statistical techniques.
- Storytelling with Data. Provide lots of basic principles for data visualization that can be broadly applied broadly to your slide decks, analysis reports, or dashboards.
- Lean Analytics. I didn't read until the last chapter, but I found this book useful to teach me how to build a good & relevant metrics.
- Black Swan: The Impact of Highly Improbable. An eye-opening book to remind us to look at beyond the data we see. I especially like the chicken analogy: A chicken is being fed by human every single day. After several days, it learned the pattern that whenever human comes, food will be served; not knowing that one day the chicken itself is the one who will be roasted and served as food. So lesson learned: what has happened in the past might not always be a good predictor of the future.
- Moneyball. A real story on how analytics brought glory for a low budget baseball team.
- Naked Statistics. This book is kinda similar with The Signal and The Noise. But I like Naked Statistics better because it gives complete overview of basic statistics, the intuition and the common pitfalls, all in very light and brief explanation.
- Never start with a hypothesis. How many times have you encountered people who ask you to analyze data for the sake of "knowing it", without having an actual decision to make? This article gives guidelines on how to properly build a hypothesis before spending your time on extensive data analysis. It argues that unless you have default action (what you’d prefer to do if you stay ignorant), no fancy statistics is needed at all. The statistics inference is needed only to convince you to switch into alternative action.
- Simpson’s Paradox: How to Prove Opposite Arguments with the Same Data. If you haven't heard about Simpson's Paradox before, then this is a must read. Don't fall into the trap and make the wrong conclusion!
- Mistakes, we’ve drawn a few. Good real examples on visualization mistakes and how to improve it from The Economist.
- On Average - 99% Invisible. Using average as the standard measurement might not always be a good idea. What if nobody or nothing fits the average?
- Geo Experiment. This deck gives a light intro about how to measure incremental effect of a campaign with geo experiment & CausalImpact library in R.
- Gradient Boosting Decision Trees Algorithm Explained. So I was reading a paper before I came across this article. I could not understand how gradient boosting tree works by reading the paper, but with that article I was able to understand the logic in a few minutes.
- Bandit Test You've heard a lot about A/B test. Maybe you've also heard about Bayesian A/B test. But have you tried Bandit test? See when it's best to use bandit test in this article.
- Recommender System - Datacamp. For those who want to learn to build a recommendation system in Python. This is one of the tutorials I followed when I first tried to build a reecommendation system for POI & Restaurants in London (I shared it on this repo).
- Jupyter Notebook Extension. There are lots of hidden gems! I personally use Table of Contents (for navigation whenever my notebooks get very long) & Hide Code Input (because sometimes it's just quicker to present the notebook, and you might want to show the output cells only!).
- Cohort Analysis. Cohort Analysis is very important when you want to measure retention & churn. This tutorial gives an example on how to do it in Python, along with the visualization.
- Analytics Training - Mode Analytics. It provies a set of cases to help you practice solving analytical problems, from thinkin through the possible scenario, to actually checking the data with SQL queries & charts.
- Online Experimentation at Microsoft. Stories & lesson learned from past experiments at Microsoft.
- A Comparison of Approaches to Advertising Measurement. For those interested in running & evaluating campaigns on Facebook ads, this paper provides a comparison for different approaches to measure the impact of the experiment. It explains why the measurement is not straightforward and what factors need to be considered.