Ann B Lee

Ann B Lee

Professor, Co-Director of PhD Program in Statistics

Department of Statistics & Data Science / Machine Learning Department, Carnegie Mellon University


I am a professor in the Department of Statistics & Data Science at Carnegie Mellon University, with a joint appointment in the Machine Learning Department. Prior to joining CMU in 2005, I was the J.W. Gibbs Assistant Professor in the department of mathematics at Yale University, and before that I served a year as a visiting research associate at the department of applied mathematics at Brown University.

My research interests are in developing statistical methodology for the type of complex data and problems often encountered in the physical sciences. I am particularly interested in statistical methods that adapt to nonlinear sparse structure in high-dimensional data, and nonparametric approaches that can handle heterogeneous data from different scientific probes. My recent work includes uncertainty quantification via conditional density estimation, likelihood-free inference, validation of emulator models, and applications in astronomy and hurricane intensity guidance involving satellite imagery and massive astronomical surveys.

In 2018, I started the STAtistical Methods for Physical Sciences (STAMPS) research group together with Mikael Kuusela. STAMPS is hosting public colloquia-style webinars open to all members of the scientific community, in addition to weekly research group meetings for students and faculty at CMU and UPitt. I am also key personnel in the recently founded NSF AI Planning Institute for Data-Driven Discovery in Physics.

đźš© Upcoming! Our NSF AI Planning Institute at CMU is organizing the virtual workshop “From Quarks to Cosmos with AI”. The workshop will take place during the week of July 12-16, 2021, and is organized around hackathon-type data challenges relevant to high-energy physics, astrophysics and AI/ML methodology in physics The workshop will also include daily talks by a stellar lineup of speakers. Attendance is free and open to anyone with relevant research interests. For details with a link to the registration page, go to our conference website. Make sure to register before the July 5 deadline! (Organizers: Tiziana Di Matteo, Mikael Kuusela, Ann Lee, Rachel Mandelbaum, Manfred Paulini)


  • Statistical Machine Learning
  • High-Dimensional Inference
  • Nonlinear Dimension Reduction
  • Statistics for the Physical Sciences


  • PhD in Physics

    Brown University

  • MSc/BSc in Engineering Physics

    Chalmers University of Technology, Sweden

Recent Publications

(2021). Validating Conditional Density Models and Bayesian Inference Algorithms. Accepted to the 37th Conference on Uncertainty in Artifical Intelligence (UAI 2021), July 27-30, 2021.

Preprint Code Video

(2020). Wildfire Smoke and Air Quality: How Machine Learning Can Guide Forest Management. Tackling Climate Change with Machine Learning workshop at NeurIPS 2020 (Spotlight talk).

Preprint Slides Video

(2020). Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting. Proceedings of the Thirty-Seventh International Conference on Machine Learning (ICML 2020), PMLR 119:2323-2334, 2020.

Preprint PDF Code Video

(2020). Evaluation of probabilistic photometric redshift estimation approaches for LSST. Monthly Notices of the Royal Astronomical Society, Volume 499, Issue 2, December 2020, Pages 1587–1606.

Preprint Code DOI

(2019). Global and Local Two-Sample Tests via Regression. Electronic Journal of Statistics, 13(2): 5253–5305, 2019.

Preprint DOI

(2017). Local Two-Sample Testing: A New Tool for Analysing High-Dimensional Astronomical Data. Monthly Notices of the Royal Astronomical Society (MNRAS), 471(3): 3273–3282, 2017.

Preprint Code DOI


I coordinate the STAtistical Methods for the Physical Sciences (STAMPS) Research Group at CMU together with Mikael Kuusela.

I am fortunate to advise the following amazing students:

Current Graduate Students

Nic Dalmasso (PhD May 2021) Trey McNeely (thesis) David Zhao (thesis)
Galen Vincent (project 2021) Luca Masserano (project 2021) Pavel Khokhlov (MS study '21)

Previous PhD Students

  • Taylor Pospisil
    – PhD May 2019, Department of Statistics & Data Science, CMU
    – Thesis title: Conditional Density Estimation for Regression and Likelihood-Free Inference

  • Rafael Izbicki
    – PhD April 2014, Department of Statistics, CMU
    – Thesis title: A Spectral Series Approach to High-Dimensional Nonparametric Inference
    – 2014 Best Thesis Award, Department of Statistics, CMU

  • Di Liu
    – PhD July 2012, Department of Statistics, CMU
    – Thesis title: Comparing Data Sources in High Dimensions

  • Andrew Crossett
    – co-advised with Kathryn Roeder
    – PhD May 2012, Department of Statistics, CMU
    – Thesis title: Using Dimension Reduction Techniques to Model Genetic Relationships for Association Studies

  • Susan Buchman
    – co-advised with Chad Schafer
    – PhD March 2011, Department of Statistics, CMU
    – Thesis title: High-Dimensional Adaptive Basis Density Estimation

  • Joseph W. Richards
    – co-advised with Chad Schafer
    – PhD July 2010, Department of Statistics, CMU
    – Thesis title: Fast and Accurate Estimation for Astrophysical Problems in Large Databases
    – 2010 ASA Student of the Year, Pittsburgh Chapter

  • Diana Luca
    – co-advised with Kathryn Roeder
    – PhD Sept 2008, Department of Statistics, CMU
    – Thesis title: Genetic Matching by Ancestry in Genome-Wide Association Studies

News & Events

  • đźš© Upcoming! July 12-16: Our NSF AI Planning Institute for Data-Driven Discovery in Physics is organizing the virtual conference “From Quarks to Cosmos with AI” at Carnegie Mellon University. Make sure to register before July 5! For details with a link to the registration page, go to our conference website.
  • May 14: Giving a virtual talk on “Likelihood-Free Frequentist Inference” at MIT Stochastics and Statistics Seminar.
  • May 12: Giving a virtual talk on “Calibration and Validation of Approximate Likelihood Models” at IMSI workshop “Verification, Validation, and Uncertainty Quantification Across Disciplines”.
  • May 2021: Our paper on “Validating Conditional Density Models and Bayesian Inference Algorithms” has been accepted to UAI 2021. Go to LSST ISSC videos to see David Zhao’s April 29th talk-tutorial to the LSST ISSC and the LSST DESC pz groups.
  • Summer 2021: Area Chair for NeurIPS 2021.
  • April 2021: Received NSF DMS award # 2053804 for “Statistical Procedures and Performance Measures for Simulator-Based Frequentist Inference” (PI: Lee; co-PIs: Kuusela and Ramdas)
  • April 2021: I’m very proud that my student Nic Dalmasso has successfully defended his thesis “Uncertainty Quantification in Simulation-Based Inference”. Congratulations Dr Dalmasso!
  • April 2021: Huge congratulations to Nic Dalmasso for receiving the 2021 ASA Pittsburgh Chapter Student of the Year Award!




  • Modern Ideas in Statistics and AI for Climate and Environmental Sciences (STAT 36-722); Spring 2021.
  • Advanced Methods for Data Analysis (STAT 36-402/608); Spring 2017, 2018, 2019, 2020, 2021.
  • Modern Regression (STAT 36-401/607); Fall 2018.
  • Advanced Data Analysis II (STAT 36-758); Fall 2015, 2016, 2017.
  • Mathematical Statistics Honors (STAT 36-326); Spring 2014, 2015, 2016.
  • Probability and Statistics I (STAT 36-625); Fall 2005, 2006, 2007, 2013, 2014.
  • Statistical Practice (STAT 36-726); Spring 2012, 2016.
  • Engineering Statistics and Quality Control (STAT 36-220); Fall 2010, 2011.
  • Machine Learning Journal Club (ML 10-915), Machine Learning Department, CMU; Fall 2009, 2010.
  • Probability and Statistics II (STAT 36-626); Spring 2006, 2007, 2008, 2010.
  • Probability and Statistics for Business Applications (STAT 36-207); Fall 2009.
  • Applied Mathematics and Engineering I (AMTH 251), Yale University; Fall 2003, 2004.
  • Introduction to Calculus in Several Variables (MATH 118), Yale University; Spring 2004.
  • Pattern Theory and its Applications (STAT 2), 12th Jyväskylä Ph.D. Summer School, Aug 2002, Finland.