Katy McKeough is a fifth-year Ph.D. student in the Harvard University Statistics Department. Her research involves using advanced statistical models in applied settings including sports analytics and astrostatistics. She graduated from Carnegie Mellon University in 2015 with a degree in physics with a secondary major of statistics. She is a member of both the CHASC: Astrostatistics Group and the Sports Analytics Lab at Harvard.
About The Conference
Now in its third year, the Carnegie Mellon Sports Analytics Conference is dedicated to highlighting the latest sports research from the statistics and data science community.
Interested in presenting your research at CMSAC? Submit an abstract using the form below! And if you are using publicly available data then consider entering our second annual Reproducible Research Competition!
ENTER THE COMPETITION
In an effort to foster reproducible research in the sports analytics community, we are hosting the second annual CMSAC Reproducible Research Competition!
- August 8th: Abstract submission deadline.
- August 15th: Selected abstracts will be notified and invited to submit papers (max of 10 pages).
- September 15th: Paper submissions deadline for selected abstracts.
- November 2nd: The top papers will present at CMSAC, in addition to being awarded cash prizes!
There will be separate prizes for students and non-student tracks. All other selected abstracts will be invited to present posters at the conference. Paper reviews and presentation status will be sent out early October. Your submission must follow the rules:
1.) Research must be based entirely on data that is freely available to the public (no paywall).
2.) All code and analysis steps must be available for anyone to view. Using GitHub and completing all analysis in RMarkdown, Jupyter Notebooks, or some similar service is ideal, but not required.
3.) One co-author must be willing/able to present work at CMSAC!
Papers are judged not just on the research/contributions, but also on the reproducibility of your analysis and code.
CALL FOR ABSTRACTS
In an effort to foster intellectual growth and discovery among the statistics and data science community, we gladly welcome research submissions from the public.
Submit your research project using the form by August 31st, indicating whether or not you want your submission considered for a contributed talk and/or poster. Note that there are limited spaces available, and abstracts for talks and posters will be accepted on a rolling basis until slots are filled. Final acceptance notifications will be sent out by mid-September.
Here's a recap of important dates and requirements to remember:
- Aug 31st: Abstract submission deadline.
- Abstracts will be selected on a rolling basis, final notification by mid-September, 2019.
NOTE: This research submission form is not considered for entry into the reproducible research competition, meaning it does not require publicly available data and sharing of code (nor entry for cash prizes).
More information coming soon!
Football Analytics Workshop
Led by Ron Yurko, the CMSAC football analytics workshop is a three-hour event (5 to 8 PM), with the first hour dedicated to introducing attendees to reading, wrangling, and visualizing publicly available NFL data with the R statistical programming language, specifically using the tidyverse. The third hour of the workshop will cover the basics of using R to generate ELO ratings for NFL teams, a popular rating system featured on websites such as FiveThirtyEight. The middle hour of the workshop features our keynote speaker, Michael Lopez, who will discuss his work as the NFL Director of Data and Analytics. No prior programming experience is required, more information will be available soon.
Into the tidyverse with NFL dataBy Ron Yurko
Keynote Speaker: Michael LopezDirector of Football Data and Analytics, NFL>
Introduction to NFL ELO ratingsBy Ron Yurko
Conference Keynote Speaker
Cade Massey is a Practice Professor in the Wharton School’s Operations, Information and Decisions Department. He received his PhD from the University of Chicago and taught at Duke University and Yale University before moving to Penn in 2012. Massey’s research focuses on judgment under uncertainty – how well people predict what will happen in the future – and especially processes that blend experts and algorithms. His work draws on experimental and “real world” data such as employee stock options, 401k savings, the National Football League draft, and graduate school admissions. His research has led to long-time collaborations with Google, Merck and multiple professional sports franchises. Massey is faculty co-director of Wharton People Analytics, co-host of “Wharton Moneyball” on SiriusXM Business Radio, and co-creator of the Massey-Peabody NFL Power Rankings for the Wall Street Journal and Washington Post.
Workshop Keynote Speaker
Mike Lopez is the Director of Football Data and Analytics with the NFL, and an adjunct professor and research associate at Skidmore College. He received his PhD in Biostatistics from Brown University in 2010. His research spans causal inference – with a specific focus on causal inference methods for multiple exposures or multiple exposure doses – and the application of statistics to sports.
Harvard University Statistics Department
Growth Curves for Predicting Athlete Ratings
It is often the goal of sports analysts, coaches and fans to predict athlete performance over time. Methods such as Elo, Glicko and Placket-Luce based ratings measure athlete skill based on results of competitions over time but have limited predictive strength on their own. Growth curves are often applied in the context of sports to predict future ability, but these curves are too simple to account for complex career trajectories. We propose a mixture of non-linear, mixed-effects growth curves to model the ratings as a function of athlete age and time. The mixture of growth curves allow for flexibility of the estimated shape of the career trajectories between athletes as well as between sports. We use the fitted growth curves to make predictions about the future career trajectory of an athlete. We apply this method to men's slalom results but it can be generalized to other sports.
Sameer is a post-doctoral researcher at MITs Computer Science and Artificial Intelligence Laboratory. He received his Ph.D. in Statistics from the Wharton School at the University of Pennsylvania in 2018. His methodological research primarily focuses on Bayesian model selection, clustering, and model averaging. In addition to studying the effects of playing football, he has previously worked on estimating how NBA players help their teams win games and on quantifying the uncertainty about the value of pitch framing in baseball. Outside of statistics, he is an avid sports fan, with particular affection for Dallas-based teams.
Expected Hypothetical Completion Probability: An Analysis from the 2019 NFL Big Data Bowl
Using high-resolution player tracking data made available by the National Football League (NFL) for their 2019 Big Data Bowl competition, we introduce the Expected Hypothetical Completion Probability (EHCP), a objective framework for evaluating plays. At the heart of EHCP is the question 'on a given passing play, did the quarterback throw the pass to the receiver who was most likely to catch it?' To answer this question, we first built a Bayesian non-parametric catch probability model that automatically accounts for complex interactions between inputs like the receiver's speed and distances to the ball and nearest defender. While building such a model is, in principle, straightforward, using it to reason about a hypothetical pass is challenging because many of the model inputs corresponding to a hypothetical are necessarily unobserved. To wit, it is impossible to observe how close an un-targeted receiver would be to his nearest defender had the pass been thrown to him instead of the receiver who was actually targeted. To overcome this fundamental difficulty, we propose imputing the unobservable inputs and averaging our model predictions across these imputations to derive EHCP. In this way, EHCP can track how the completion probability evolves for each receiver over the course of a play in a way that accounts for the uncertainty about missing.
Christopher J. Phillips is Associate Professor of History at Carnegie Mellon University. He received his PhD in History of Science from Harvard University and also taught at New York University before coming to CMU in 2015. Phillips’s research focuses on the history of statistics, and in particular, the supposed benefits of introducing numbers and analytics into new fields. He is the author of Scouting and Scoring: How We Know What We Know About Baseball (Princeton University Press) and also serves as an Associate Editor for the Harvard Data Science Review.
Brian is currently the Director of Sports Analytics in the Stats & Information Group at ESPN. He was previously the Director of Hockey Analytics with the Florida Panthers Hockey Club, an Associate Professor in the Department of Mathematical Sciences at West Point, an Adjunct Professor in the Department of Management Science at the University of Miami, and an Adjunct Professor in Sports Analytics in the College of Business at Florida Atlantic University. He received a Bachelor of Science in Electrical Engineering from Lafayette College, Easton, PA, and a Master of Arts and a Ph.D. in Mathematics from Johns Hopkins University, Baltimore, MD.
As editor-in-chief of Hockey-Graphs, Asmae manages the day-to-day operations and oversees the editorial and creative process. In her role, she spearheaded a mentorship program pairing NHL data scientists and executives with underrepresented persons. She has hosted multiple workshops in data viz and modelling at MIT Sloan Sports Analytics Conferences. She also works for the Massachusetts General Hospital Institue of Technological Assessment as a Data Analyst.
FROM GRAPES AND PRUNES TO APPLES AND APPLES: USING MATCHED METHODS TO ESTIMATE OPTIMAL ZONE ENTRY DECISION-MAKING IN THE NATIONAL HOCKEY LEAGUE
Previous research in the National Hockey League has suggested that teams' decisions to gain the offensive zone with puck possession ("carry-ins") is preferred over dumping the puck in and chasing after it ("dump-ins"). However, standard comparisons of zone entry strategy are confounded by factors such as offensive and defensive talent, location on the ice, and shift time, each of which impact player choice. Indeed, contrasting carry-ins to dump-ins isn’t exactly an apples-to-apples comparison; instead, it is more like studying grapes versus prunes. Using two matching methods – propensity score matching and Bayesian additive regression trees – we leverage player-tracking data to estimate the causal benefits due to zone-entry decisions. Both approaches better account for the variables that affect entry choice. We also highlight the wide-ranging potential of the causal inference framework with player tracking data in sports while emphasizing the challenges of using standard statistical methods to inform decision-making in the presence of substantial confounding.
Meredith J. Wills, Ph.D., is a Sports Data Product Specialist for SportsMEDIA Technology (SMT). She has a B.A. in Astronomy & Astrophysics from Harvard University, and an M.S. and Ph.D. in Physics from Montana State University—Bozeman. Dr. Wills joined SMT in 2018, where she works primarily with FIELDf/x, a baseball ball- and player-tracking system used by minor league teams, international leagues, and NCAA. She also writes for The Athletic, and her best-known independent research involves MLB baseball construction and its effect on the game.
The 2019 Home Run Surge: A Whole New Ballgame (Again)
In 2017, Major League Baseball saw an unprecedented increase in home runs. It was determined that this Home Run Surge was caused by a physical change to the ball. By disassembling a sample of baseballs and studying their construction, I found that the introduction of thicker laces ultimately produced a more aerodynamic ball. This past season, MLB’s home run rate soared even farther, and was once again related to changes in baseball construction. Using similar methods, I disassembled a sample of 2019 baseballs and compared their properties to those of earlier populations. This time, my findings showed that multiple aspects of the ball had changed, and that these differences could account for lower drag and a higher home run rate. Evidence suggests that the changes were due to manufacturing process modifications and better quality control, and that the extent of the ball’s aerodynamic improvement—while perhaps not unwelcome—was likely unexpected.
Carnegie Mellon University
Baker Hall (A51, Giant Eagle Auditorium)
4909 Frew St, Pittsburgh, PA 15213
From PIT Airport
1. Head northeast on Airport Blvd
2. Keep left to stay on Airport Blvd - 0.6 mi
3. Keep left to stay on Airport Blvd - 0.7 mi
4. Continue straight to stay on Airport Blvd - 0.2 mi
5. Keep left at the fork, follow signs for
I-376 E/I-79 E/Pittsburgh/Pennsylvania Turnpike E and
merge onto I-376 E - 0.6 mi
6. Merge onto I-376 E - 16.4 mi
7. Keep right to stay on I-376 E - 2.1 mi
8. Take exit 72A to merge onto Forbes Ave toward Oakland - 0.3 mi
9. Merge onto Forbes Ave - 1.0 mi
10. Turn right onto Schenley Drive Extension - 449 ft
11. Turn left onto Schenley Drive - 0.2 mi
12. Turn left onto Frew St 0.2 mi
13. Destination will be on the left
Early Bird Registration (until Oct 15th)
- High School students – FREE (with school ID)
- Undergrad/Grad students Conference: $20 (with school ID)
- Undergrad/Grad students Workshop: $10 (with school ID)
- Undergrad/Grad students Conference + Workshop: $25 (with school ID)
- Non-students Conference: $50
- Non-students Workshop: $20
- Non-students Conference + Workshop: $60
Regular Registration (Oct 16th - Nov 1st)
- High School students – FREE (with school ID)
- Undergrad/Grad students Conference: $25 (with school ID)
- Undergrad/Grad students Workshop: $10 (with school ID)
- Undergrad/Grad students Conference + Workshop: $30 (with school ID)
- Non-students Conference: $75
- Non-students Workshop: $20
- Non-students Conference + Workshop: $85
Registering indicates agreement to abide by the Code of Conduct .
The Carnegie Mellon Sports Analytics Conference is proudly hosted by the Department of Statistics & Data Science
and the Carnegie Mellon Sports Analytics club.
Questions can be directed to email@example.com.
CMSAC Activities Conduct Policy
(modeled on the ASA Activities Conduct Policy approved November 30, 2018 by American Statistical Association Board of Directors)
The Carnegie Mellon Sports Analytics Conference (CMSAC) is committed to providing an atmosphere in which personal respect and intellectual growth are valued and the free expression and exchange of ideas are encouraged. Consistent with this commitment, it is CMSAC policy that all participants in CMSAC activities enjoy a welcoming environment free from unlawful discrimination, harassment, and retaliation. We strive to be a community that welcomes and supports people of all backgrounds and identities. This includes, but is not limited to, members of any race, ethnicity, culture, national origin, color, immigration status, social and economic class, educational level, sex, sexual orientation, gender identity and expression, age, size, family status, political belief, religion, and mental and physical ability.
All CMSAC participants —including, but not limited to, attendees, statisticians, data scientists, sports analysts, students, registered guests, staff, contractors, sponsors, exhibitors, and volunteers —in the conference or any other related activity—whether official or unofficial—agree to comply with all rules and conditions of the activities. Your registration for or attendance at the 2019 Carnegie Mellon Sports Analytics Conference indicates your agreement to abide by this policy and its terms.
- Model and support the norms of professional respect necessary to promote the conditions for healthy exchange of scientific ideas.
- Speak and conduct yourself professionally; do not insult or disparage other participants.
- Be conscious of hierarchical structures in the sports analytics and/or broader statistics/data science community, specifically the existence of stark power differentials among students, junior analysts/statisticians, and senior analysts/statisticians—noting that fear of retaliation from those in senior-level positions can make it difficult for students or those in junior level positions to express discomfort, rebuff unwelcome advances, and report violations of the conduct policy.
- Be sensitive to body language and other non-verbal signals and respond respectfully.
- Violent threats or language directed against another person
- Discriminatory jokes and language
- Inclusion of unnecessary sexually explicit, violent, or otherwise sensitive materials in presentations
- Posting (or threatening to post), without permission, other people’s personally identifying information online, including on social networking sites
- Personal insults including, but not limited to, those using racist, sexist, homophobic, or xenophobic terms
- Unwelcome solicitation of emotional or physical intimacy such as sexual advances; propositions; sexual flirtations; sexually-related touching; and graphic gestures or comments about sex or another person’s dress, body, or sexual activities
- Advocating for, encouraging, or dismissing the severity of any of the above behaviors.
Consequences of Unacceptable Behavior
At the sole discretion of the CMSAC Program Committee, unacceptable behavior may result in removal from or denial of access to meeting facilities or activities, without refund of any applicable registration fees or costs. In addition, the CMSAC reserves the right to report violations to an individual’s employer or institution or to a law-enforcement agency. Those engaging in unacceptable behavior may also be banned from future CMSAC activities or face additional penalties.
What to Do if You Witness or Are Subject to Unacceptable Behavior
If you are being harassed, notice that someone else is being harassed, or have any other concerns relating to harassment, please contact a member of the CMSAC program committee either in person or at firstname.lastname@example.org. If you witness potential harm to a conference participant, be proactive in helping to mitigate or avoid that harm; if you see or hear something that concerns you, please say something.
Process for Adjudicating Reports of Misconduct
The CMSAC will contract with an independent entity to manage and adjudicate reported violations of the conduct policy.
Note: This Code of Conduct may be revised at any time by the Carnegie Mellon Sports Analytics Conference. Questions, concerns, or comments should be directed to email@example.com.