Local Two-Sample Testing: A New Tool for Analysing High-Dimensional Astronomical Data


Modern surveys have provided the astronomical community with a flood of high-dimensional data, but analyses of these data often occur after their projection to lower dimensional spaces. In this work, we introduce a local two-sample hypothesis test framework that an analyst may directly apply to data in their native space. In this framework, the analyst defines two classes based on a response variable of interest (e.g. higher mass galaxies versus lower mass galaxies) and determines at arbitrary points in predictor space whether the local proportions of objects that belong to the two classes significantly differ from the global proportion. Our framework has a potential myriad of uses throughout astronomy; here, we demonstrate its efficacy by applying it to a sample of 2487 i-band-selected galaxies observed by the HST-ACS in four of the CANDELS programme fields. For each galaxy, we have seven morphological summary statistics along with an estimated stellar mass and star formation rate (SFR). We perform two studies: one in which we determine regions of the seven-dimensional space of morphological statistics where high-mass galaxies are significantly more numerous than low-mass galaxies, and vice versa, and another study where we use SFR in place of mass. We find that we are able to identify such regions, and show how high-mass/low-SFR regions are associated with concentrated and undisturbed galaxies, while galaxies in low-mass/high-SFR regions appear more extended and/or disturbed than their high-mass/low-SFR counterparts.

Monthly Notices of the Royal Astronomical Society (MNRAS), 471(3): 3273–3282, 2017