Classical multi-level and Bayesian approaches to population size estimation using multiple lists


Stephen E. Fienberg

Matthew S. Johnson

Brian W. Junker

October 9, 1998

To be presented October 20, 1998 at the Royal Statistical Society Conference on Applications of Random Effects/Multilevel Models to Categorical Data in Social Sciences and Medicine.

One of the major objections to the standard  multiple-recapture approach to population estimation is the  assumption of homogeneity of individual ``capture'' probabilities.  Modeling individual capture heterogeneity is complicated by the fact  that it shows up as as a restricted form of interaction between  lists in the contingency table cross-classifying list memberships  for all individuals.  Traditional log-linear modeling approaches to  capture-recapture problems are well-suited to modeling interactions  among lists, but ignore the special dependence structure that  individual heterogeneity induces.  A random-effects approach, based  on the Rasch (1960) model from educational testing and introduced in  this context by Darroch, et al. (1993) and Agresti (1994), provides  one way to introduce the dependence resulting from heterogeneity  into the log-linear model; however, previous efforts to combine the  Rasch-like heterogeneity terms additively with the usual log-linear  interaction terms suggest that a more flexible approach is required.  In this paper we consider both classical multi-level approaches and  fully Bayesian hierarchical approaches to modeling individual heterogeneity and list interactions.  Our framework encompasses both the traditional log-linear approach and various elements from the full Rasch model.  We compare these approaches on two examples, the first arising out of an epidemiological study of a population of  diabetics in Italy, and the second a study intended to assess the ``size'' of the World Wide Web.  We also explore extensions allowing  for interactions between the Rasch and log-linear portions of the models in both the classical and Bayesian contexts.

Keywords: Log-linear models; Markov chain Monte Carlo methods; Multiple-recapture census; Quasi-symmetry; Rasch model.

by Brian Junker

Brian Junker                    (412) 268 - 8873
Department of Statistics
232 Baker Hall                  FAX: (412) CMU-STAT
Carnegie Mellon University        or (412) 268-7828
Pittsburgh PA 15213