Click here to see reports listed by years

Click here to see reports listed by areas

My primary research area is in statistical inference for Big Data, focusing on how to better the inference by exploiting all kinds of sparsity we see today (signal sparsity, graph sparsity, sparsity in eigenvalues of matrices, etc.).

When you collect data with increasingly more features (i.e., increasing large dimensions), the signals tend to be increasingly more sparse as the number of true features would not increase proportionally. At the same time, in many cases we can not enroll sufficient subjects for experiments (such as study on a rare disease), so the sample size would not grow proportionally with the number of featuers, and the signals end up being weak.

In this "Rare and Weak" situation, classical methods and most contemporary empirical methods are simply overwhelmed, and principled statistical approach are badly in need.

- Larg-Scale Multiple Hypotheses Testing.
- Cancer Classification.
- Variable selection.
- Spectral clustering and Principle Componenent Analysis (PCA).
- Random Matrix Theory (RMT).
- Network Analysis.
- Graph theory and pricision matrix.

- Genomics: gene microarray, SNP, protein mass spectroscopy.
- Complicate graphs and networks.
- Astronomy and Cosmology: non-Gaussian signature detection in Cosmic Micorwave Background (CMB).
- Computer Security.
- Link to Astrostatistics group at Purdue
- Link to Internation Group of Computational Astrostatistics Group

- Higher Criticism (with variants for signal detection, classification, and spectral clustering).
- Graphlet Screening for variable selection (with two other variants: Univariate Penalized Screening (UPS) and Covariance Assisted Screening and Estimattion (CASE)).
- Fourier-transformation based procedures for estimating the null parameters and proportion of non-null effects in large-scale multiple testing.
- SCORE for network community detection.
- A new method for estimating the precision matrix (in progress).

Just like there are three phases for water (water vapor, water, and ice), there are three phases for many given statistical problems (variable selection, classification, multiple testing, spectral clustering). The phase diagram is a two-dimensional parameter space, where the x-axis calibrates the signal rarity, and the y-axis calibrates the signal strength. For a particular statistics problem, say, variable selection, the phase space usually partitions into three sub-regions (and so the name of phase diagrams), Phase I-III.

- In Phase I, the problem under consideration is relatively trivial since the signals are sufficiently strong.
- In Phase II. the problem under consideration is notrivial but it is still possible to have a reliable solution as the signlas are moderately strong.
- In Phase III. it is impossible to have a reliable solution simply because the signals are simply too rate and weak.

- Detecing rare and weak signals.
- Variable selection.
- Classification.
- Clustering.
- Estimating the proportion of signals.
- Low-rank matrix recovery.