I started my PhD in the Department of Statistics & Data Science at Carnegie Mellon University in 2017. I previously received my B.S. in Statistics at Carnegie Mellon in December 2015, worked as a baseball operations data and analytics intern for the Pittsburgh Pirates during the 2014 season, as well as a quantitative analyst in the financial services industry.
Currently, I am working with Rebecca Nugent on the reproducibility of data science workflows, aka the science of data science. For my Advanced Data Analysis project I am analyzing the text and actions of students in an introductory stats & data science course to understand how students learn statistics. Additionally, I’m working on variable selection for consistent clustering and clustering/classification work on ensemble prediction.
I became fascinated with statistics because of baseball and fantasy football. Sam Ventura, Max Horowitz, and myself are developing new, and reproducible, advanced metrics for the NFL. As an undergrad, I was a founding member of the Carnegie Mellon Sports Analytics club.
I’m also a member of our department’s Teaching Statistics research group.
You can check out my CV here.
BS in Statistics, 2015
Carnegie Mellon University
Sat, Jun 23, 2018, Classification Society Annual Meeting 2018
Thu, May 17, 2018, Symposium on Data Science and Statistics
Sat, Sep 23, 2017, New England Symposium on Statistics in Sports
Thu, Jul 13, 2017, Great Lakes Analytics in Sports Conference
R package to scrape soccer commentary and statistics from ESPN.
An R package to compute WAR for offensive players using nflscrapR.
Developing R package with Max Horowitz and Sam Ventura that allows R users to utilize and analyze data from the National Football League (NFL) API. The functions in this package allow users to perform analysis at the play and game levels on single games and entire seasons. With open-source data, the development of reproducible advanced NFL metrics can occur at a more rapid pace and lead to growing the football analytics community.
Hosted by the CMU Statistics & Data Science Department and Carnegie Mellon Sports Analytics Club, the Carnegie Mellon University Baseball Analytics Workshop is an interactive workshop focusing on data exploration skills with baseball data!
The starter code script for the first of the workshop, using
ggplot2 to explore historical baseball data from the
Lahman package is available here.
The starter code script for the second half of the workshop, dedicated to using PITCHf/x and Statcast data for creating a game plan for the Pirates against the Reds, is available here.
Additional resources that will be useful for working with this data include:
The repository for all of the workshop’s material is located here.
All PITCHf/x and Statcast data, made available by MLBAM, was accessed using the