14 Unbiased mean versus biased variance in plain English
One of the things I have learned during my statistics course is that mean is an unbiased estimator whereas variance is a biased estimator and, therefore, requires a correction29. Here I attempt to provide an intuition for why that is the case using as few formulas as possible.
We start by noting that a sample mean (mean of the data that you have) is (almost) always different from the population “true” mean you are interested in. This is a trivial consequence of sampling variance. It would be pretty unlikely that you would hit exactly the “true” population mean with your limited sample. This means that your sample mean is wrong but it is a wrong in a balanced way. It is equally likely to be larger and smaller than the “true” mean30. Therefore, if you would draw infinite number of samples of the same size and compute their sample means these random deviations to the left and to the right from the true mean would cancel each other out and on average your mean estimate will correspond to the true mean. In short, all sample means are wrong individually but correct on average. Thus, they are not wrong in a systematic way and, in other words, mean is an unbiased estimator.
What about variance? Variance is just an average squared distance to the true population mean \(\mu\): \(\frac{1}{N}\sum\limits^{N}_{i=1}{(x_i-\mu)^2}\). Unfortunately, you do not know that true population mean and, therefore, you compute variance (a.k.a. average squared/ L2 distance) relative to the sample mean \(\bar{x}\): \(\frac{1}{N}\sum\limits^{N}_{i=1}{(x_i-\bar{x})^2}\) and that makes all the difference. Recall that if you use squared distance as your loss function, sample mean is the point that has minimal average distance to all points in the sample31. To put it differently, sample mean is the point that minimizes computed variance. If you pick any other point but the sample mean, the average distance / variance will necessarily be larger. However, we already established that true population mean is different from the sample mean and if we would compute sample variance relative to the true mean, it would be larger (again, because it is always larger for any point that is not a sample mean). How much larger will depend on how wrong the sample mean is (something we cannot know) but it will always be larger. Thus, variance computed relative to the sample mean is systematically smaller than than the “correct” variance, i.e. it is a biased estimator. Hence the \(\frac{1}{n-1}\) instead of \(\frac{1}{n}\) that attempts to correct this bias “on average”. As with the mean, even a corrected variance for the sample is wrong (not equal to the true variance of the hidden distribution that we are trying to measure) but, at least, it is not systematically wrong.