April, 1999

Tech Report

Assume that we want to estimate \(

\gamma_i(\theta) = {\rm E}_{f_{\theta}} \{ c_i(X) \} = \int c_i(x) f_{\theta}(x) \, dx

\)via simulation, for \(\theta \in \Theta\), and for several functions \(c_i, i=1\ldots I\),where X is a random variable with density \(f_{\theta}\).The importance sampling identity can be used to write \(

\gamma_i(\theta) = \int c_i(x) \left[ f_{\theta}(x ) / g(x)\right] \, g(x) \, dx

= \int t_i(x, \theta) \, g(x) \, dx,

\)which then can be estimated by \(\hat \gamma_i(\theta) = Q^{-1} \sum_{q=1}^{Q} t_i(x_q, \theta) ,

\)where \(x_1,\ldots, x_Q\) is a random sample from g. For importance sampling to be efficient though, sampling from g should be easy, g must provide adequate coverage of the sample space of possibly many densities \(f_{\theta}\),and ideally it must be chosen to minimize the variance of the resulting estimates of \(\gamma_i(\theta)\).This is a lot to achieve, particularly since the goals might be conflicting. Moreover, if several characteristics \(\gamma_i(\theta)\)must be estimated, the method is unavoidably limited as a variance reduction technique, because g can only be optimal for one particular \(\gamma_i(\theta)\); worse, it can potentially be very non-optimal for other characteristics. On the other hand, double importance sampling allows to achieve all the goals: ease of sampling, and theoretically perfect estimation of an arbitrarily large number of quantities \(\gamma_i(\theta)\).One example concerns estimation of a log likelihood function that can be written as \(\ell (\theta)=

\sum_{j} \log\int f_{\tilde X\mid X} (\tilde x_j\mid x) f_{X}(x; \theta)

\thinspace dx \),where \(\tilde x_1\), \(\ldots\), \(\tilde x_n\) are independent observed data, and X is an unobserved variable. Estimation of \(\ell (\theta)\) via direct simulation or importance sampling is very inefficient because \(f_{\tilde X\mid X}\) is much more concentrated than \(fX\); simple use of the proposed method makes the simulation very efficient.

(Revised 04/00)