Department of Statistics Unitmark
Dietrich College of Humanities and Social Sciences

Thesis Topics

Following is a list of students who recently received their Ph.D. degree from the department and where they were first employed.


Learning Social Networks from Text Data Using Covariate Information, Xiaoyi Yang

Advances in Nonasymptotic and Nonparametric Inference, Robin Dunn

Statistical Guarantees for Spectral Methods on Neighborhood Graphs, Alden Green

Clustering, Data Integration and Transfer Learning in Single-cell RNA Sequencing Data, Minshi Peng

Accounting for Changes in the Distribution of Data, Ciaran Evans

Auditing and Achieving Counterfactual Fairness, Alan Mishler

Advances in Interactive Hypothesis Testing, Boyan Duan

Counting Cycles in Networks, Shengming Luo

Statistical Guarantees for Three Unconventional Estimators, Yufei Yi

Uncertainty Quantification in Simulation-based Inference, Niccolo Dalmasso


Linkage of Early 1900s Irish Census Records: Exploring the impact of household structure and crowdsourced labels, Kayla Frisoli

Matrix-Variate Graphical Models for High-Dimensional Neural Recordings, Zongge Liu

Non-parametric causal discovery for discrete and continuous data, Octavio Mesner, post-doctoral fellow, University of Michigan

Statistical theory and methods for comparing distributions, Ilmun Kim, postdoctoral research associate, Statistical Laboratory at the University of Cambridge

Inference for clustering and anomaly detection, Purvasha Chakravarti, Chapman Fellow, Department of Mathematics, Imperial College, London

A qualitative and quantitative analysis of the bias caused by adaptivity in multi-armed bandits, Jaehyeok Shin

High-dimensional statistical methods to model heterogeneity in genomic data, Kevin Lin

Methods for the estimation of large scale Bayesian models for record linkage under one-to-one matching, Brendan McVeigh, data scientist, Waymo

Causal inference with complex data structures and non-standard effects, Kwangho Kim, postdoctoral researcher, Harvard Medical School

Statistical astrophysics: From extrasolar planets to the large-scale structure of the universe, Collin Politsch, postdoctoral fellow, Machine Learning Dept., CMU

Networks, point processes, and networks of point processes, Neil Spencer, postdoctoral fellow, Harvard University


Topics in prediction, Yotam Hechtlinger

Graphical network modeling of phase coupling in brain activity, Josue Orellana

Information flow in networks based on nonstationary multivariate neural recordings, Natalie Klein, Scientist, Statistical Sciences Group, Los Alamos National Laboratory

Low rank modeling for human intracranial electrophysiological data, Peter Elliott, Data Scientist, Google

Statistical methods for studying correlated neural data, Spencer Koerner, postdoctoral fellow, University of Pittsburgh

Estimating Probability Distributions and their Properties, Shashank Singh

Extension of cross validation with confidence to determining number of communities in stochastic block models, Jining Qin, quantitative researcher, Two Sigma Investments

Accounting for Individual Differences among Decision-Makers with Applications in Forensic Evidence Evaluation, Amanda Luby

Catalyst: Agents of change. Integration of compartment and agent-based models for use in infectious disease epidemiology, Shannon Gallaher

Matching Problems in Forensics, Xiao Hui Tai

Building the Data Science Laboratory: A Web-Based Framework for Modern Statistics, Philipp Burckhardt

Dynamic Networks and the Developing Brain: Smoothing and Change Point Detection, Fuchen Liu

The sliding window discrete Fourier Transform, Lee F. Richardson, Data Scientist, Google, San Bruno, CA

Conditional Density Estimation for Regression and Likelihood-Free Inference, Taylor Pospisil

Model selection for inverse estimation in linear regression, with application to neuroscience, Francesca Matano


Statistical inference for geometric data, Jisu Kim, Post-doctoral student at INRIA, Paris, France

Bootstrapping and Sample Splitting Under Weak Dependence, Robert Lunde, Post-doctoral student at UT Austin, TX

Post-selection inference for changepoint-type problems, Sangwon (Justin) Hyun

Topics in high dimensional statistics, Daren Wang, Postdoctoral Researcher, University of Chicago

Nonparametric Estimation of the Effects of Policies to Reduce Recidivism, Jacqueline Mauro, Post-doctoral student at UC Berkeley, CA

Point process modeling with spatiotemporal covariates for predicting crime, Alex Reinhart, Assistant teaching professor, CMU, Pittsburgh, PA

Learning from High-dimensional and Noisy Transcriptome Data, Lingxue Zhu,Two Sigma, NYC, NY

Local Structure and Inference in Large Random Graphs, Nick Kim, LinkedIn, San Francisco, CA

Limit order book models and applications, Federico Gonzalez, Theorem LP, San Francisco, CA

Model Selection and Stopping Rules for High-Dimensional Forward Selection, Jerzy Wieczorek, Assistant professor, Colby College, Waterville, ME


Causal Reasoning and Data Analysis in the Legal Context: A Nonparametric Estimator for the Probability of Causation, Maria Cuellar, Assistant professor at UPenn, Philadelphia, PA

Statistical Inference about Functional Connectivity from Multi-Neuron Data, Giuseppe Vinci; Post-doctoral position at Rice University


Characteristics of cross-validation methods for model selection in the stochastic block networks, Beau Dabbs; Survata

Constructing Approximately Sufficient ABC Summary Statistics, Michael Vespe; Squarespace, NYC, NY

Methodological Innovations in the Collection and Analysis of Human Rights Violations Data, Jana Asher; American Association for Blood Banks

Longitudinal Conditionally Independent Dyad Models for Analyzing Networks over Time, Samrachana Adhikari; Post-doctoral student, Department of Health Care Policy and Department of Biostatistics at Harvard Medical School

Statistical Inference using Geometric Features, Yen-Chi Chen; Assistant professor, University of Washington

Clustering Strategies for DNA Genotyping, Gaia Bellone; Keybank, Washington, DC

High Dimensional Sparse Precision Matrix Estimation Shiqiong Huang; Citibank

A Bayesian Partitioning Approach to Duplicate Detection and Record Linkage Mauricio Sadinle; Post-doctoral student, Duke University and the National Institute of Statistical Science

Large-Scale Classification and Clustering Methods with Applications in Record Linkage Sam Ventura; Pittsbugh Penquins Hockey

A Method to Exploit the Structure of Genetic Ancestry Spaces to Enhance Case-Control Studies Corneliu Bodea, Merck Pharmaceutical

Network Comparisons Using Sample Splitting, Lawrence Wang; Stripe

Nonparametric Techniques for Functional Data Analysis, Mattia Ciollaro: Research Scientist, Alexa Speech, Amazon and Adjunct Assistant Professor, Duke University

Understanding the Genetic Basis of Schizophrenia and other Mental Disorders by using RNA-Sequencing Data, Cong Lu; eBay-StubHub


Duration Models, Mingyu Tang; Arxis Capital

Statistical Inference for Topological Data Analysis Fabrizio Lecci; New York LIfe Insurance

Geometric Approaches to Inference: Non-Euclidean Data and Networks, Dena Asta (joint Statistics and Engineering and Public Policy), Assistant professor, The Ohio State University

A Statistical Contribution to Historical Linguistics,  Rafael Stern, Assistant professor, Federal University of Sao Carlos, Brazil

Computational and Statistical Advances in Testing and Learning Aaditya Ramdas - 2015 (joint Statistics and Machine Learning), Assistant professor, CMU

Statistical Methods in Diffusion Connectomics Patrick Foley; Stitch Fix

Social Network Modeling and the Evaluation of Structural Similarity for Community Detection, Xiaolin Yang; Amazon, Princeton, NJ

Scalable Privacy-Preserving Data Sharing Methodologies for Genome-Wide Association Studies Fei Yu; Bell Labs


Classification Via Auxiliary Information, Beatriz Estefania Etchegaray; IBM Research Postdoc

A Spectral Series Approach to High-Dimensional Nonparametric Inference, Rafael Izbicki; Assistant Professor, Dept of Statistics, Federal University of São Carlos, Brazil

Level Set Trees for Applied Statistics, Brian Kent; Dato

Local Log-Linear Models for Capture-Recapture, Zachary Kurtz; Marketing Data Scientist

Statistical Multi-coil MRI Reconstruction, Jionglin Liu; PNC Quantitative Analyst

The Efficacy of the Hedges Correction for Unmodeled Clustering, and Its Generalizations in Practical Settings, Nathan VanHoudnos; Postdoc, Northwestern University

Toward a Processing Pipeline for Two-Photon Calcium Imaging of Neural Populations, Bronwyn Woods; Software Engineering Institute, CMU

A New Parametric Model for the Point Spread Function (PSF) and Its Application to Hubble Space Telescope Data, Lubov Zeifman; University of Alaska

Statistically and computationally efficient inference from multi-neuron spike trains, Sonia Todorova; Google, New York

Spectral-HCT Approach for Clustering Problems in High Dimensional Data, Wanjie Wang; Post Doc, University of Pennsylvania

High Dimensional Statistical Analysis to Reveal the Genetic Basis of Autism, Li Liu


Mixed Membership Distributions with Applications to Modeling Multiple Strategy Usage, April Galyardt; Assistant Professor, University of Georgia

Learning Spatio-Temporal Dynamics: Nonparametric Methods for Optimal Forecasting and Automated Pattern Discovery, Georg Matthias Goerg; Google, NY

New Statistical Applications for Differential Privacy, Robert Hall; Etsy

Comparing Data Sources in High Dimensions, Di Liu; Google, NY

Incorporating Learning Over Time into the Cognitive Assessment Framework, Cassandra Studer; WeddingWire

Statistical Network Models for Replications and Experimental Interventions, Tracy Morrison Sweet; Assistant Professor; University of Maryland


High-Dimensional Adaptive Basis Density Estimation, Susan Buchman; Alcoa

Creation and Analysis of Differentially-Private Synthetic Datasets, Anne-Sophie Charest; Université Laval, Quebec, Canada

Using Dimension Reduction Techniques to Model Genetic Relatedness for Association Studies, Andrew Crossett; West Chester University

Clustering Trajectories in the Presence of Informative Monotone Missingness, Gabrielle Flynt; Assistant Professor, Bucknell University

Sequential Estimation and Detection in Statistical Inverse Problems, Darren Homrighausen; Assistant Professor, Colorado State University

Techniques for the Estimation and Prediction of Surface Segregation Occurring in Alloys, Gary Klein; SAS Institute

Generalization Error Bounds for State-Space Models, Daniel McDonald; Assistant Professor, Indiana University, Bloomington

Structured Sparsity, Daniel Percival; Google, New York

Behavioral Modeling of Botnet Populations Viewed Through Internet Protocol Address Space, Rhiannon Weaver; Software Engineering Institute, Carnegie Mellon University


Longitudinal Mixed Membership Models with Applications to Disability Survey Data, Daniel Manrique; Post-doc, Duke University

Fast and Accurate Estimation for Astrophysical Problems in Large Databases, Joey Richards; Center for Time Domain Informatics, University of California, Berkeley

A Model of Limit-order Book Dynamics and a Consistent Estimation Procedure, Linqiao Zhao; PNC Bank

Detection of Bursts in Neuronal Spike Trains Using Hidden Semi-Markov Point Process Models, Judy Xi; PNC Bank

The Short Time Fourier Transform and Local Signals, Shuhei Okamura;


Nonparametric Learning in High Dimensions, Han Liu; Assistant Professor, Princeton University

NPredicting Performance and Scaling Up Estimates of Student Skill Knowledge, Elizabeth Ayers; Post-doc, University of California, Berkeley

Adaptive Source Detection, David Friedenberg; Batelle Institute

Cues and Heuristics on Capitol Hill: Relational Models of Decision-Making in the U.S. Senate, Justin Gross; Assistant Professor, University of North Carolina, Chapel Hill

System-Oriented Characterization of the Human Visual System, Eric Huang; Biometric Research Branch, National Cancer Institute

Hyper Markov Non-Parametric Processes for Mixture Modeling and Model Selection, Daniel Heinz; CNA Insurance

Power Prediction in Large Scale Multiple Testing: A Fourier Approach, Avranil Sarkar; LinkedIn