719

**Fast Algorithms and
Efficient Statistics: Density Estimation in Large Astronomical Datasets**

**R.C. Nichol, A.J. Connolly, A.W. Moore, J. Schneider,
C. Genovese, and L. Wasserman**

### Abstract:

We present initial results on the use of Mixture Models for density
estimation in large astronomical databases. We provide herein both the
theoretical and experimental background for using a mixture model of
Gaussians based on the Expectation Maximization (EM)
Algorithm. Applying these analyses to simulated data sets we show that
the EM algorithm - - using both the AIC & BIC penalized likelihood to
score the fit - can out-perform the best kernel density estimate of
the distribution while requiring no ``fine-tuning'' of the input
algorithm parameters. We find that EM can accurately recover the
underlying density distribution from point processes thus providing an
efficient adaptive smoothing method for astronomical source
catalogs. To demonstrate the general application of this statistic to
astrophysical problems we consider two cases of density estimation;
the clustering of galaxies in redshift space and the clustering of
stars in color space. From these data we show that shift space
(describing accurately both the small and large-scale features within
the data) and a means of identifying outliers in multi-dimensional
color-color space (e.g. for the identification of high redshift
QSOs). Automated tools such as those based on the EM algorithm will be
needed in the analysis of the next generation of astronomical catalogs
(2MASS, FIRST, PLANCK, SDSS) and ultimately in the development of the
National Virtual Observatory.

*Heidi Sestrich*

*7/28/2000*
Here is the full postscript text for this
technical report. It is 1738499 bytes long.