Department of Statistics Unitmark
Dietrich College of Humanities and Social Sciences

Double Importance Sampling

Publication Date

April, 1999

Publication Type

Tech Report


Valérie Ventura


Assume that we want to estimate \(
\gamma_i(\theta) = {\rm E}_{f_{\theta}} \{ c_i(X) \} = \int c_i(x) f_{\theta}(x) \, dx
\)via simulation, for \(\theta \in \Theta\), and for several functions \(c_i, i=1\ldots I\),where X is a random variable with density \(f_{\theta}\).The importance sampling identity can be used to write \(
\gamma_i(\theta) = \int c_i(x) \left[ f_{\theta}(x ) / g(x)\right] \, g(x) \, dx
= \int t_i(x, \theta) \, g(x) \, dx,
\)which then can be estimated by \(\hat \gamma_i(\theta) = Q^{-1} \sum_{q=1}^{Q} t_i(x_q, \theta) ,
\)where \(x_1,\ldots, x_Q\) is a random sample from g. For importance sampling to be efficient though, sampling from g should be easy, g must provide adequate coverage of the sample space of possibly many densities \(f_{\theta}\),and ideally it must be chosen to minimize the variance of the resulting estimates of \(\gamma_i(\theta)\).This is a lot to achieve, particularly since the goals might be conflicting. Moreover, if several characteristics \(\gamma_i(\theta)\)must be estimated, the method is unavoidably limited as a variance reduction technique, because g can only be optimal for one particular \(\gamma_i(\theta)\); worse, it can potentially be very non-optimal for other characteristics. On the other hand, double importance sampling allows to achieve all the goals: ease of sampling, and theoretically perfect estimation of an arbitrarily large number of quantities \(\gamma_i(\theta)\).One example concerns estimation of a log likelihood function that can be written as \(\ell (\theta)=
\sum_{j} \log\int f_{\tilde X\mid X} (\tilde x_j\mid x) f_{X}(x; \theta)
\thinspace dx \),where \(\tilde x_1\), \(\ldots\), \(\tilde x_n\) are independent observed data, and X is an unobserved variable. Estimation of \(\ell (\theta)\) via direct simulation or importance sampling is very inefficient because \(f_{\tilde X\mid X}\) is much more concentrated than \(fX\); simple use of the proposed method makes the simulation very efficient.

(Revised 04/00)