frequentist-03-bias-in-estimators.qmd

# Unbiased mean versus biased variance in plain English

One of the things I have learned during my statistics course is that mean is an _unbiased_ estimator whereas variance is a _biased_ estimator and, therefore, requires a correction^[Note that `var()` function in R does not compute variance of the sample but is an estimator, so it applies the correction automatically. If you want variance of the sample itself, you need to undo the correct or write your own function.]. Here I attempt to provide an intuition for why that is the case using as few formulas as possible.

We start by noting that a _sample_ mean (mean of the data that you have) is (almost) always different from the _population_ "true" mean you are interested in. This is a trivial consequence of sampling variance. It would be pretty unlikely that you would hit _exactly_ the "true" population mean with your limited sample. This means that your sample mean is wrong but it is a wrong in a balanced way. It is equally likely to be larger and smaller than the "true" mean^[Assuming that sampling distribution for the mean is approximately normal.]. Therefore, if you would draw infinite number of samples of the same size and compute their _sample means_ these random deviations to the left and to the right from the true mean would cancel each other out and _on average_ your mean estimate will correspond to the true mean. In short, all _sample_ means are wrong individually but correct on average. Thus, they are not wrong in a systematic way and, in other words, mean is an unbiased estimator.

What about variance? Variance is just an average squared distance to the _true population_ mean $\mu$: $\frac{1}{N}\sum\limits^{N}_{i=1}{(x_i-\mu)^2}$. Unfortunately, you do not know that true population mean and, therefore, you compute variance (a.k.a. average squared/ L2 distance) relative to the _sample_ mean $\bar{x}$: $\frac{1}{N}\sum\limits^{N}_{i=1}{(x_i-\bar{x})^2}$ and _that_ makes all the difference. Recall that if you use squared distance as your [loss function](#loss-functions), sample mean is the point that has minimal average distance to all points in the sample^[This is effectively a definition of the mean, take a look at the notes on loss functions to see why this is the case]. To put it differently, sample mean is the point that _minimizes_ computed variance. If you pick _any_ other point but the _sample_ mean, the average distance / variance will necessarily be larger. However, we already established that _true population_ mean is different from the  _sample_ mean and if we would compute _sample_ variance relative to the true mean, it would be larger (again, because it is always larger for any point that is not a sample mean). How much larger will depend on how wrong the sample mean is (something we cannot know) but it will _always_ be larger. Thus, variance computed relative to the sample mean is systematically smaller than than the "correct" variance, i.e. it is a _biased_ estimator. Hence the $\frac{1}{n-1}$ instead of $\frac{1}{n}$ that attempts to correct this bias "on average". As with the mean, even a corrected variance for the sample is wrong (not equal to the true variance of the hidden distribution that we are trying to measure) but, at least, it is not systematically wrong.