In a 1935 paper, and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. Although there has been much discussion of Bayesian hypothesis testing in the context of criticism of -values, less attention has been given to the Bayes factor as a practical tool of applied statistics. In this paper we review and discuss the uses of Bayes factors in the context of five scientific applications.
The points we emphasize are:
- from Jeffreys's Bayesian point of view, the purpose of hypothesis testing is to evaluate the evidence in favor of a scientific theory;
- Bayes factors offer a way of evaluating evidence in favor of a null hypothesis;
- Bayes factors provide a way of incorporating external information into the evaluation of evidence about a hypothesis;
- Bayes factors are very general, and do not require alternative models to be nested;
- several techniques are available for computing Bayes factors, including asymptotic approximations which are easy to compute using the output from standard packages that maximize likelihoods;
- in "non-standard" statistical models that do not satisfy common regularity conditions, it can be technically simpler to calculate Bayes factors than to derive non-Bayesian significance tests;
- the Schwarz criterion (or BIC) gives a crude approximation to the logarithm of the Bayes factor, which is easy to use and does not require evaluation of prior distributions;
- when one is interested in estimation or prediction, Bayes factors may be converted to weights to be attached to various models so that a composite estimate or prediction may be obtained that takes account of structural or model uncertainty;
- algorithms have been proposed that allow model uncertainty to be taken into account when the class of models initially considered is very large;
- Bayes factors are useful for guiding an evolutionary model-building process;
- and, finally, it is important, and feasible, to assess the sensitivity of conclusions to the prior distributions used.