Some Motivations for Bayesian Statistics
If you’ve been following my Twitter stream, you have probably seen that I’m doing some reading and study on Bayesian statistics lately. For a variety of reasons, I find the Bayesian model of statistics quite compelling and am hoping to be able to use it in some of my research.
Traditional statistics, encapsulating well-known methods such as t-tests, ANOVA, etc. are from the frequentist school of statistical thought. The basic idea of frequentist statistics is that the world is described by parameters that are fixed and unknown. These parameters can be all manner of things — the rotation rate of the earth, the average life span of a naked mole rat, or the average number of kittens in a litter of cats. It is rare that we can have access to the entire population of interest (e.g. all mature female cats) to be able to directly measure the parameter, so we estimate parameters by taking random samples from the population, computing some statistic over the sample, and using that as our estimate of the population parameter. Since these parameters are unknown, we do not know their exact values. Since they are fixed, however, we cannot discuss them in probabilistic terms. Probabilistic reasoning only applies to random variables, and parameters are not random — we just don’t know what their values are. Probabilities, expected values, etc. are only meaningful in the context of the outcome of multiple repeated random experiments drawn from the population.
The Bayesian says, “Who cares?”. Bayesian statistics applies probabilistic methods and reasoning directly to the parameters. This doesn’t necessarily mean that the Bayesian thinks the world is really random, though. It turns out that we can use probabilities not only to express the chance that something will occur, but we can also use them to express the extent to which we believe something and the math all still works. So we can use the algebra of probabilities to quantify and describe how much we believe various propositions, such as “the average number of kittens per litter is 47”.
One of the fundamental differences, therefore, is that the frequentist can only apply probabilities to the act of repeating an experiment. The Bayesian can apply probabilities directly to their knowledge of the world. There are other important differences as well — frequentistic statistics is primarily concerned with testing and falsifying hypotheses, while Bayesian focuses more on determining which of several competing models or hypotheses is most likely to be true — but those differences play less of a role in what I find compelling about Bayesian statistics, and it is also possible to apply falsificationist principles in a Bayesian framework.
The core act of Bayesian inference is to compute posterior probabilities for various proposed models given some data. It is built on Bayes’ theorem, that \(P(\theta|\gamma) \propto P(\gamma|\theta) P(\theta)\) — the posterior probability of some explanation for the world, denoted by \(\theta\), being the correct explanation for some observed data \(\gamma\) is proportional to the probability of seeing \(\gamma\) if \(\theta\) is true combined with the prior probability of \(\theta\) being true in the first place. So the Bayesian comes up with an a priori idea of what models may explain the observed data and how likely they are to be correct, combines it with how likely each of those models is to produce the observed data, and updates their belief in the truth of each of those models. This kind of belief update provides a direct mathematical encoding of iterative aspects of the scientific method — we have ideas of how the world works, we perform experiments and collect data, and use those data to update our ideas of what is true.
So now to the primary purpose of this post — to outline what I find so compelling about Bayesian statistics.
The results are understandable. The output of frequentist statistics is difficult to interpret. It is quite difficult to understand exactly what a confidence interval is and how it arises; a 95% confidence interval does not mean that there is a 95% chance that the true parameter lies in the interval. What it acually means is much more complex and is in terms of multiple repetitions of the experiment. Bayesian probability distributions and posterior intervals, on the other hand, do exactly what they say — they directly express our belief about the parameter’s value. They are easy to understand and easy to explain.
Bayesian inference explicitly tracks and incorporates prior knowledge. We have to be careful about this, but if we have prior knowledge about the data and potential mechanisms behind it, we can incorporate that knowledge into our inference. Yes, this means that prior beliefs can influence the outcome, but…
All parts of the model, including priors and other assumptions, are explicit and open to criticism. The only aspect of Bayesian inference taken for granted is Bayes’ theorem. Everything else, in particular the prior assumptions, is made explicit, documented, and can be critiqued in the peer review and post-publication discussion processes. Frequentist statistics makes assumptions about how the world works, but these assumptions are buried deep within the methods used and do not necessarily hold in the data to which they are applied. In Bayesian inference, everything is explicitly articulated either in the priors or in the structure of the model itself. Some assumptions, such as independence, are still quite subtle, but they can still be explicitly dealt with.
The model is scrutable. This is one of the biggest ones for me — since the entire model is explicit, it is necessary (and possible) for the scientist to understand it and work through the ramifications of various assumptions. One of my great frustrations with frequentist statistics is the difficulty of understanding the impact of the various assumptions made by a statistical method. Any tool makes assumptions about the data. Some of these assumptions, such as normal distribution of errors, can be checked; others, such as independence, are incredibly difficult to test. In both cases, though, it is difficult as a working scientist to find good answers on how much violating a particular assumption affects the validity of the result. Traditional statistical methods are largely black boxes; they are handed to us by the statisticians, but it is difficult for those of us without graduate degrees in statistics to peek inside, see how they work, and understand how they break. When we build the model from scratch, we have to understand how it works, and we have the opportunity to trace various changes, assumptions, etc. through it and see how they affect the result.
I want to understand the math and methods I use. Bayesian statistics, so far, makes that much easier.
Finally, in Bayesian Inference in Statistical Analysis (Wiley), Box and Tiao make another powerful observation of the merits of Bayesian statistics:
Data gathering is frequently expensive compared with data analysis. It is sensible then that hard-won data be inspected from many different viewpoints. In the selection of viewpoints, Bayesian methods allow greater emphasis to be given to scientific interest and less to mathematical convenience.
With traditional statistics, much of our experiment design and analysis is driven by the mathematical necessities of the statistical methods at our disposal. While we still need to take care when designing experiments, Bayesian methods hold the potential to allow much greater flexibility in analyzing the data as they sit and pursuing questions on their scientific merits more than on their mathematical tractability. It also provides clean ways to model things such as non-independence that cause substantial problems for many traditional methods.
I’m still pretty early in this process — I’ve only gotten through first couple chapters of Gelman et al.’s Bayesian Data Analysis. But it seems to be a promising direction, and it’s a lot of fun to learn.