Department of Statistics Unitmark
Dietrich College of Humanities and Social Sciences

Bayesian Inference

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as evidence. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

There are currently no projects for this area of research.

A Markov chain Monte Carlo approach to reconstructing ancestral genome arrangements

We describe a Bayesian approach to infer phylogeny and ancestral genome arrangements on the basis of genome arrangement data using a model in which gene inversion is the sole mechanism of change. A Bayesian approach provides a means to quantify the uncertainty in the phylogeny and in the ancestral genome arrangements. We describe a method of sampling phylogenies from the posterior distribution via Markov chain Monte Carlo (MCMC) that is computationally feasible for large data sets. We compare and contrast this MCMC approach with methods which reconstruct maximum parsimony phylogenies from genome arrangement data and demonstrate several advantages of a Bayesian approach to this problem. Furthermore, we have found that our sampler has discovered many genome rearrangement scenarios that require fewer gene inversions on a Campanulaceae cpDNA data set than other published reconstructions which were thought to be most parsimonious.

Bayesian Empirical Likelihood

Empirical likelihood has been suggested as a data-based, nonparametric alternative to the usual likelihood function. Research has shown that empirical likelihood tests have many of the same asymptotic properties as those derived from parametric likelihoods. This leads naturally to the possibility of using empirical likelihood as the basis for Bayesian inference. Different ways in which this goal might be accomplished are considered. The validity of the resultant posterior inferences is examined, as are frequentist properties of the Bayesian empirical likelihood intervals.

BAYESIAN GOODNESS OF FIT TESTING USING INFINITE DIMENSIONAL EXPONENTIAL FAMILIES

We develop a nonparametric Bayes factor for testing the fit of a parametric model. We begin with a nominal parametric family which we then embed into an infinite dimensional exponential family. The new model then has a parametric and nonparametric component. We give the log density of the nonparametric component a Gaussian process prior. An asymptotic consistency requirement puts a restriction on the form of the prior leaving us with a single hyperparameter for which we suggest a default value based on simulation experience. Then we construct a Bayes factor to test the nominal model versus the semiparametric alternative. Finally, we show that the Bayes factor is consistent. The proof of the consistency is based on approximating the model by a sequence of exponential families.

Consistency of Posterior Distributions for Neural Networks

In this paper we show that the posterior distribution for feedforward neural networks is asymptotically consistent. This paper extends earlier results on universal approximation properties of neural networks to the Bayesian setting. The proof of consistency embeds the problem in a density estimation problem, then uses bounds on the bracketing entropy to show that the posterior is consistent over Hellinger neighborhoods. It then relates this result back to the regression setting. We show consistency in both the setting of the number of hidden nodes growing with the sample size, and in the case where the number of hidden nodes is treated as a parameter. Thus we provide a theoretical justification for using neural networks for nonparametric regression in a Bayesian framework.

Dynamics of Bayesian Updating with Dependent Data and Misspecified Models

Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data being independent and identically distributed or Markovian. Here I establish sufficient conditions for the convergence of the posterior distribution in non-parametric problems even when all of the hypotheses are wrong, and the data-generating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or "Shannon-McMillan-Breiman") property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the "replicator dynamics" of evolutionary theory.

Ignorance Bliss?

"...where ignorance is bliss, 'tis folly to be wise."
Thomas Gray

If ignorance were bliss, there is information you would pay not to have. Hence the question is whether a rationally-behaving agent would ever do such a thing. This paper demonstrates that

1. A Bayesian agent with a proper, countably additive prior never maximizes utility by paying not to see cost-free data.
2. The definition of "cost-free" is delicate, and requires explanation.
3. A Bayesian agent with a finitely additive prior, or an improper prior, however, might pay not to see cost-free data.
4. An agent following a gamma-minimax strategy might also do so.
5. An agent following the strategies of E-admissibility recommended by Levi and of maximality recommended by Sen and Walley, might also do so.

A discussion follows about how damaging to a decision theory intended to be rational it might be to pay not to receive cost-free information.

Nonparametric Inference in Astrophysics

We discuss nonparametric density estimation and regression for astrophysics problems. In particular, we show how to compute nonparametric confidence intervals for the location and size of peaks of a function. We illustrate these ideas with recent data on the Cosmic Microwave Background. We also briefly discuss nonparametric Bayesian inference.

Rates of Convergence of Posterior Distributions

We compute the rate at which the posterior distribution concentrates around the true parameter value. The spaces we work in are quite general and include infinite dimensional cases. The rates are driven by two quantities: the size of the space, as measure by metric entropy or bracketing entropy, and the degree to which the prior concentrates in a small ball around the true parameter. We apply the results to several examples. In some cases, natural priors give sub-optimal rates of convergence and better rates can be obtained by using sieve-based priors.

(Revised 08/98)

Statistical Analysis of Temporal Evolution in Single-Neuron Firing Rates

A fundamental methodology in neurophysiology involves recording the electrical signals associated with individual neurons within brains of awake behaving animals. Traditional statistical analyses have relied mainly on mean firing rates over some epoch (often several hundred milliseconds) that are compared across experimental conditions by Analysis of Variance. Often, however, the time course of the neuronal firing patterns is of interest, and a more refined procedure can produce substantial additional information. In this paper we compare neuronal firing in the supplementary eye field of a macaque monkey across two experimental conditions. We take the electrical discharges, or "spikes," to be arrivals in a inhomogeneous Poisson process and then model the firing intensity function using both a simple parametric form and more flexible splines. Our main interest is in making inferences about certain characteristics of the intensity, including the timing of the maximal firing rate. We examine data from 84 neurons individually and also combine results into a hierarchical model. We use Bayesian estimation methods and frequentist significance tests based on a nonparametric bootstrap procedure. We are thereby able to conclude that a substantial fraction of the neurons exhibit important temporal differences in firing intensity across the two conditions, and we quantify the effect across the population of neurons.

Statistical Methods for Eliciting Probability Distributions

Elicitation is a key task for subjectivist Bayesians. While skeptics hold that it cannot (or perhaps should not) be done, in practice it brings statisticians closer to their clients and subject-matter-expert colleagues. This paper reviews the state-of-the-art, reflecting the experience of statisticians informed by the fruits of a long line of psychological research into how people represent uncertain information cognitively, and how they respond to questions about that information. In a discussion of the elicitation process, the first issue to address is what it means for an elicitation to be successful, i.e. what criteria should be employed? Our answer is that a successful elicitation faithfully represents the opinion of the person being elicited. It is not necessarily "true" in some objectivistic sense, and cannot be judged that way. We see elicitation as simply part of the process of statistical modeling. Indeed in a hierarchical model it is ambiguous at which point the likelihood ends and the prior begins. Thus the same kinds of judgment that inform statistical modeling in general also inform elicitation of prior distributions. The psychological literature suggests that people are prone to certain heuristics and biases in how they respond to situations involving uncertainty. As a result, some of the ways of asking questions about uncertain quantities are preferable to others, and appear to be more reliable. However data are lacking on exactly how well the various methods work, because it is unclear, other than by asking using an elicitation method, just what the person believes. Consequently one is reduced to indirect means of assessing elicitation methods. The tool-chest of methods is growing. Historically the first methods involved choosing hyperparameters using conjugate prior families, at a time when these were the only families for which posterior distributions could be computed. Modern computational methods such as Markov Chain Monte Carlo have freed elicitation from this constraint. As a result there are now both parametric and non-parametric methods available for low-dimensional problems. High dimensional problems are probably best thought of as lacking another hierarchical level, which has the effect of reducing the as-yet-unelicited parameter space. Special considerations apply to the elicitation of group opinions. Informal methods, such as Delphi, encourage the participants to discuss the issue in the hope of reaching consensus. Formal methods, such as weighted averages or logarithmic opinion pools, each have mathematical characteristics that are uncomfortable. Finally, there is the question of what a group opinion even means, since it is not necessarily the opinion of any participant.

(Revised 01/05)

The Consistency of Posterior Distribtions in Nonparametric Problems

We give conditions that guarantee that the posterior probability of every Hellinger neighborhood of the true density tends to 1 almost surely. The conditions are (i) a smoothness condition on the prior and (ii) a requirement that the prior put positive mass in appropriate neighborhoods of the true density. The results are based on the idea of approximating the set of densities with a finite dimensional set of densities and then computing the Hellinger bracketing metric entropy of the approximating set. We apply the results to some examples.

Who Wrote Ronald Reagan's Radio Addresses?

In his campaign for the U.S. presidency from 1975 to 1979, Ronald Reagan delivered over 1000 radio broadcasts. For over 600 of these we have direct evidence of Reagan's authorship of the text of the speeches, in the form of yellow pads, with material written "in his own hand". The aim of this study was to determine the authorship of 314 of the broadcasts for which no direct evidence is available.
Peter Hannaford had been Reagan's main aide in drafting texts for the radio addresses during the years 1976-79, whereas the situation was less clear in 1975, thus we learned both how to discriminate between the writing styles of Reagan and Hannaford, and we focused on stylistic differences between Reagan and the undistinguished pool of his collaborators to properly address the prediction problem for speeches delivered in different epochs. We explored a wide range of off-the-shelf classification methods as well as fully Bayesian Poisson and Negative-Binomial models for word counts. Simple majority voting reinforced the cross-validated accuracies of our predictions on speeches of known authorship, that settled beyond 90% in most cases. We produced separate sets of predictions using the most accurate classification methods and the fully Bayesian models, for the 314 speeches whose author is uncertain. All the predictions agree on 135 of the "unknown" speeches, whereas the fully Bayesian models agree on 289 of them. We further approximated log-odds of authorship as a measure of the strength of our predictions.

Among the crucial issues we had to deal with were the bold difference in the number of "known" speeches available for each author, and the phase of word selection. In the original dataset there were 679 speeches drafted by Reagan "in his own hand" and only 39 drafted by few close collaborators. With the help of Prof. Kiron Skinner and Prof. Annelise Anderson we looked into the Reagan files and we found 30 newspaper columns originally drafted by Peter Hannaford, but published with Reagan's signature. We coded them to obtain a set of 69 texts drafted by Reagan's collaborators, on which we based our inferences. The process of word selection was critical in order to understand "the secrets" of Reagan's writing style. Word counts very much fit the Negative-Binomial profile, and we relied on this fact to compute p-values for a certain statistic (\(T_1\)) in order to capture structural elements of differential writing style. We considered other criteria to find words with discriminatory power as Information Gain scores, computed for Multinomial and multivariate Bernoulli models, as well as a semantic decomposition of the speeches using Docu-Scope software. Following Frederick Mosteller and David Wallace in their analysis of "the Federalist Papers", we aimed for non-contextual words with possibly a few exceptions, that occurred with high, medium and low frequency. In making the decisions about contextuality a prior idea of Reagan's style based on the text of the Presidential debate Reagan vs. Carter, several notes, comments, and books about Ronald Reagan played a role. As an example consider the word Carter: our prior idea about Reagan's style suggested that Reagan would seldom talk about his opponent, Carter, his line of attack being more subtle. He would mostly address the government or capitol hill people and similar figures instead. Thus when the word Carter passed severe testing to make sure that its differential use by Reagan and Hannaford was too marked to be the outcome of pure chance, and it was likely to capture some element of Hannaford writing style, we did not discard it as contextual. Some have argued that Reagan's writing style might be better captured by some idioms he used. Thus we extended our analysis to the study of successive words to discover that, for example, idioms like if we, in our, I'd like to or in America identify Reagan's writing style beyond reasonable doubts.

We concluded that, in 1975, Ronald Reagan drafted 77 speeches, and his collaborators drafted 71, whereas, over the years 1976-1979, Reagan drafted 90 speeches and Hannaford drafted 74. The cross-validated accuracy of our best fully Bayesian model based on the Negative-Binomial distribution for word counts was above 90% in all cases. Further our inferences were not sensible to "reasonable" variations in the sets of constants underlying the prior distributions, which we bracketed with a small study on 90 high-frequency, function words. Our predictions for the speeches whose author is uncertain are accurate and reliable, and the agreement of several methods in predicting the author of the "unknown" speeches in most cases reinforced our confidence.

There are currently no lab groups for this area of research.