Some Motivations for Bayesian Statistics

If you’ve been following my Twitter stream, you have probably seen that I’m doing some reading and study on Bayesian statistics lately. For a variety of reasons, I find the Bayesian model of statistics quite compelling and am hoping to be able to use it in some of my research.

Traditional statistics, encapsulating well-known methods such as t-tests, ANOVA, etc. are from the frequentist school of statistical thought. The basic idea of frequentist statistics is that the world is described by parameters that are fixed and unknown. These parameters can be all manner of things — the rotation rate of the earth, the average life span of a naked mole rat, or the average number of kittens in a litter of cats. It is rare that we can have access to the entire population of interest (e.g. all mature female cats) to be able to directly measure the parameter, so we estimate parameters by taking random samples from the population, computing some statistic over the sample, and using that as our estimate of the population parameter. Since these parameters are unknown, we do not know their exact values. Since they are fixed, however, we cannot discuss them in probabilistic terms. Probabilistic reasoning only applies to random variables, and parameters are not random — we just don’t know what their values are. Probabilities, expected values, etc. are only meaningful in the context of the outcome of multiple repeated random experiments drawn from the population.

The Bayesian says, “Who cares?”. Bayesian statistics applies probabilistic methods and reasoning directly to the parameters. This doesn’t necessarily mean that the Bayesian thinks the world is really random, though. It turns out that we can use probabilities not only to express the chance that something will occur, but we can also use them to express the extent to which we believe something and the math all still works. So we can use the algebra of probabilities to quantify and describe how much we believe various propositions, such as “the average number of kittens per litter is 47”.

One of the fundamental differences, therefore, is that the frequentist can only apply probabilities to the act of repeating an experiment. The Bayesian can apply probabilities directly to their knowledge of the world. There are other important differences as well — frequentistic statistics is primarily concerned with testing and falsifying hypotheses, while Bayesian focuses more on determining which of several competing models or hypotheses is most likely to be true — but those differences play less of a role in what I find compelling about Bayesian statistics, and it is also possible to apply falsificationist principles in a Bayesian framework.

The core act of Bayesian inference is to compute posterior probabilities for various proposed models given some data. It is built on Bayes’ theorem, that \(P(\theta|\gamma) \propto P(\gamma|\theta) P(\theta)\) — the posterior probability of some explanation for the world, denoted by \(\theta\), being the correct explanation for some observed data \(\gamma\) is proportional to the probability of seeing \(\gamma\) if \(\theta\) is true combined with the prior probability of \(\theta\) being true in the first place. So the Bayesian comes up with an a priori idea of what models may explain the observed data and how likely they are to be correct, combines it with how likely each of those models is to produce the observed data, and updates their belief in the truth of each of those models. This kind of belief update provides a direct mathematical encoding of iterative aspects of the scientific method — we have ideas of how the world works, we perform experiments and collect data, and use those data to update our ideas of what is true.

So now to the primary purpose of this post — to outline what I find so compelling about Bayesian statistics.

Finally, in Bayesian Inference in Statistical Analysis (Wiley), Box and Tiao make another powerful observation of the merits of Bayesian statistics:

Data gathering is frequently expensive compared with data analysis. It is sensible then that hard-won data be inspected from many different viewpoints. In the selection of viewpoints, Bayesian methods allow greater emphasis to be given to scientific interest and less to mathematical convenience.

With traditional statistics, much of our experiment design and analysis is driven by the mathematical necessities of the statistical methods at our disposal. While we still need to take care when designing experiments, Bayesian methods hold the potential to allow much greater flexibility in analyzing the data as they sit and pursuing questions on their scientific merits more than on their mathematical tractability. It also provides clean ways to model things such as non-independence that cause substantial problems for many traditional methods.

I’m still pretty early in this process — I’ve only gotten through first couple chapters of Gelman et al.’s Bayesian Data Analysis. But it seems to be a promising direction, and it’s a lot of fun to learn.