Ann B Lee

Ann B Lee

Professor, Co-Director of PhD Program in Statistics

Department of Statistics & Data Science / Machine Learning Department, Carnegie Mellon University

Welcome!

I am a professor in the Department of Statistics & Data Science at Carnegie Mellon University, with a joint appointment in the Machine Learning Department. Prior to joining CMU in 2005, I was the J.W. Gibbs Assistant Professor in the department of mathematics at Yale University, and before that I served a year as a visiting research associate at the department of applied mathematics at Brown University.

My research interests are in developing statistical methodology for the type of complex data and problems often encountered in the physical sciences. I am particularly interested in statistical methods that adapt to nonlinear sparse structure in high-dimensional data, and nonparametric approaches that can handle heterogeneous data from different scientific probes. My recent work includes uncertainty quantification via conditional density estimation, likelihood-free inference, validation of emulator models, and applications in astronomy and hurricane intensity guidance involving satellite imagery and massive astronomical surveys.

In 2018, I started the STAtistical Methods for Physical Sciences (STAMPS) research group together with Mikael Kuusela. STAMPS is hosting public colloquia-style webinars open to all members of the scientific community, in addition to weekly research group meetings for students and faculty at CMU and UPitt. I am also key personnel in the recently founded NSF AI Planning Institute for Data-Driven Discovery in Physics.

Interests

  • Statistical Machine Learning
  • High-Dimensional Inference
  • Nonlinear Dimension Reduction
  • Statistics for the Physical Sciences

Education

  • PhD in Physics

    Brown University

  • MSc/BSc in Engineering Physics

    Chalmers University of Technology, Sweden

Recent Publications

(2020). Wildfire Smoke and Air Quality: How Machine Learning Can Guide Forest Management. Tackling Climate Change with Machine Learning workshop at NeurIPS 2020 (Spotlight talk).

Preprint Slides Video

(2020). Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting. Proceedings of the Thirty-Seventh International Conference on Machine Learning (ICML 2020), PMLR 119:2323-2334, 2020.

Preprint PDF Code Video

(2019). Global and Local Two-Sample Tests via Regression. Electronic Journal of Statistics, 13(2): 5253–5305, 2019.

Preprint DOI

(2017). Local Two-Sample Testing: A New Tool for Analysing High-Dimensional Astronomical Data. Monthly Notices of the Royal Astronomical Society (MNRAS), 471(3): 3273–3282, 2017.

Preprint Code DOI

Group

I coordinate the STAtistical Methods for the Physical Sciences (STAMPS) Research Group at CMU together with Mikael Kuusela.

I am fortunate to advise the following amazing students:

Current PhD Students

Nic Dalmasso (PhD May 2021) Trey McNeely (thesis) David Zhao (thesis)
Lorenzo Tomaselli (project 2020) Luca Masserano (project 2021) Galen Vincent (project 2021)

Previous PhD Students

  • Taylor Pospisil
    – PhD May 2019, Department of Statistics & Data Science, CMU
    – Thesis title: Conditional Density Estimation for Regression and Likelihood-Free Inference

  • Rafael Izbicki
    – PhD April 2014, Department of Statistics, CMU
    – Thesis title: A Spectral Series Approach to High-Dimensional Nonparametric Inference
    – 2014 Best Thesis Award, Department of Statistics, CMU

  • Di Liu
    – PhD July 2012, Department of Statistics, CMU
    – Thesis title: Comparing Data Sources in High Dimensions

  • Andrew Crossett
    – co-advised with Kathryn Roeder
    – PhD May 2012, Department of Statistics, CMU
    – Thesis title: Using Dimension Reduction Techniques to Model Genetic Relationships for Association Studies

  • Susan Buchman
    – co-advised with Chad Schafer
    – PhD March 2011, Department of Statistics, CMU
    – Thesis title: High-Dimensional Adaptive Basis Density Estimation

  • Joseph W. Richards
    – co-advised with Chad Schafer
    – PhD July 2010, Department of Statistics, CMU
    – Thesis title: Fast and Accurate Estimation for Astrophysical Problems in Large Databases
    – 2010 ASA Student of the Year, Pittsburgh Chapter

  • Diana Luca
    – co-advised with Kathryn Roeder
    – PhD Sept 2008, Department of Statistics, CMU
    – Thesis title: Genetic Matching by Ancestry in Genome-Wide Association Studies

News & Events

Talks

(non-technical)

Teaching

  • Modern Ideas in Statistics and AI for Climate and Environmental Sciences (STAT 36-722); Spring 2021.
  • Advanced Methods for Data Analysis (STAT 36-402/608); Spring 2017, 2018, 2019, 2020, 2021.
  • Modern Regression (STAT 36-401/607); Fall 2018.
  • Advanced Data Analysis II (STAT 36-758); Fall 2015, 2016, 2017.
  • Mathematical Statistics Honors (STAT 36-326); Spring 2014, 2015, 2016.
  • Probability and Statistics I (STAT 36-625); Fall 2005, 2006, 2007, 2013, 2014.
  • Statistical Practice (STAT 36-726); Spring 2012, 2016.
  • Engineering Statistics and Quality Control (STAT 36-220); Fall 2010, 2011.
  • Machine Learning Journal Club (ML 10-915), Machine Learning Department, CMU; Fall 2009, 2010.
  • Probability and Statistics II (STAT 36-626); Spring 2006, 2007, 2008, 2010.
  • Probability and Statistics for Business Applications (STAT 36-207); Fall 2009.
  • Applied Mathematics and Engineering I (AMTH 251), Yale University; Fall 2003, 2004.
  • Introduction to Calculus in Several Variables (MATH 118), Yale University; Spring 2004.
  • Pattern Theory and its Applications (STAT 2), 12th Jyväskylä Ph.D. Summer School, Aug 2002, Finland.

Contact