Double Importance Sampling

Valérie Ventura

Revised 4/00


Assume that we want to estimate $
\gamma_i(\theta) = {\rm E}_{f_{\theta}} \{ c_i(X) \} = \int c_i(x) f_{\theta}(x) \, dx
$via simulation, for $\theta \in \Theta$, and for several functions $c_i, i=1\ldots I$,where X is a random variable with density $f_{\theta}$.The importance sampling identity can be used to write $
\gamma_i(\theta) = \int c_i(x) \left[ f_{\theta}(x ) / g(x)\right] \, g(x) \, dx
= \int t_i(x, \theta) \, g(x) \, dx,
$which then can be estimated by $\hat \gamma_i(\theta) = Q^{-1} \sum_{q=1}^{Q} t_i(x_q, \theta) ,
$where $x_1,\ldots, x_Q$ is a random sample from g. For importance sampling to be efficient though, sampling from g should be easy, g must provide adequate coverage of the sample space of possibly many densities $f_{\theta}$,and ideally it must be chosen to minimize the variance of the resulting estimates of $\gamma_i(\theta)$.This is a lot to achieve, particularly since the goals might be conflicting. Moreover, if several characteristics $\gamma_i(\theta)$must be estimated, the method is unavoidably limited as a variance reduction technique, because g can only be optimal for one particular $\gamma_i(\theta)$; worse, it can potentially be very non-optimal for other characteristics. On the other hand, double importance sampling allows to achieve all the goals: ease of sampling, and theoretically perfect estimation of an arbitrarily large number of quantities $\gamma_i(\theta)$.One example concerns estimation of a log likelihood function that can be written as $\ell (\theta)=
\sum_{j} \log\int f_{\tilde X\mid X} (\tilde x_j\mid x) f_{X}(x; \theta)
\thinspace dx $,where $\tilde x_1$, $\ldots$, $\tilde x_n$ are independent observed data, and X is an unobserved variable. Estimation of $\ell (\theta)$ via direct simulation or importance sampling is very inefficient because $f_{\tilde X\mid X}$ is much more concentrated than fX; simple use of the proposed method makes the simulation very efficient.

Keywords: control variate, importance sampling, importance sampling weight diagnostics, likelihood estimation, ratio estimate, regression estimate, surface estimation, variance reduction

Here is the full postscript text for this technical report. It is 609428 bytes long.