Deviation from these requirements will only be allowed in exceptional cases, and with the approval of the Director of Graduate Studies. With a few exceptions, students in joint degree programs are also required to satisfy all of these requirements.

The core Ph.D. course requirements are as follows:

**36-699: Immigration to Statistics****36-705: Intermediate Statistics****36-707: Regression Analysis****36-708: Statistical Machine Learning****36-709: Advanced Statistics I****36-750: Statistical Computing****36-757 and 36-795: Advanced Data Analysis**

Students are introduced to the faculty and their interests, the field of statistics, and the facilities at Carnegie Mellon. Each faculty member gives at least one elementary lecture on some topic of his or her choice. In the past, topics have included: the field of statistics and its history, large-scale sample surveys, survival analysis, subjective probability, time series, robustness, multivariate analysis, psychiatric statistics, experimental design, consulting, decision-making, probability models, statistics and the law, and comparative inference.

This course covers the fundamentals of theoretical statistics. Topics include: probability inequalities, point and interval estimation, minimax theory, hypothesis testing, data reduction, convergence concepts, Bayesian inference, nonparametric statistics, bootstrap resampling, VC dimension, prediction and model selection.

This is a course in data analysis using multiple linear regression. Topics covered include simple linear regression, ordinary least squares and weighted least squares, the geometry of least squares, quadratic forms, F tests and ANOVA tables, residuals, outlier detection, and identification of influential observations, variable selection methods, and modern regression techniques. Essential background in linear algebra is reviewed where necessary. When time permits other topics such as nonlinear regression and robust estimation will be discussed. Practice in data analysis is obtained through course projects.

This course is an introduction to modern methods in machine learning from a statistical perspective. Topics include: nonparametric regression, classification (linear classifiers, nearest neighbors, random forests, boosting, support vector machines), graphical models, clustering, EM, PCA, causality, deep learning, optimal transport.

This course is a one-semester overview of topics in Probability Theory. After a brief introduction to measure and integration theory, the focus will be on issues of immediate use to statisticians, such as modes of convergence, limit theorems, laws of large numbers, martingales, and other topics as time allows.

A detailed introduction to elements of computing relating to statistical modeling, targeted to advanced undergraduates, masters students, and doctoral students in Statistics. Topics include important data structures and algorithms; numerical methods; databases; parallelism and concurrency; and coding practices, program design, and testing. Multiple programming languages will be supported (e.g., C, R, Python, etc.). Those with no previous programming experience are welcome but will be required to learn the basics of at least one language via self-study.

Advanced Data Analysis (ADA) is a Ph.D. level seminar on advanced methods in statistics, including computationally intensive smoothing, classification, variable selection and simulation techniques. During 36-757, you work with the seminar instructor to identify an ADA project for yourself. The ADA project is an extended project in applied statistics, done in collaboration with an investigator from outside the Department, under the guidance of a faculty committee, culminating in a publishable paper that is presented orally and in writing. While 36-757 is a standard course with weekly class meetings, 36-795 is a research project, and only entails regular meetings with the ADA advisor(s). NB: While the 36-795 course requirements are satisfied by a successful oral presentation and written paper, the ADA project may not necessarily be completed at the end of the semester. The completion of the ADA project as a whole is at the discretion of the ADA advisor, and may extend up to one additional semester following 36-795.

Students are also strongly encouraged to take courses in machine learning, 10-701 or 10/715, and 10-716.

All Ph.D. students are also expected to successfully complete the following:

**The Advanced Data Analysis (ADA) Project**- The project is required to have an “outside advisor” with expertise in the application area and the particular question being addressed by the project. In most situations this individual provides the data and the question to be addressed. The outside advisor should not be an expert in statistics. Part of the objective of the ADA project is to give the student the opportunity to work in collaboration with someone who does not possess prior knowledge of the statistical methods to be employed. Students should develop the ability to explain and justify their chosen approaches to data analysis.
- The student should maintain regular (at least monthly) contact with the outside advisor to discuss progress and to ensure that relevant work is undertaken. Some of these interactions can take place over email, but personal interactions are also needed.
- The project must use real data, not data simulated by a computer model.
- The project must have a faculty advisor from inside the Department of Statistics and Data Science. The student should meet regularly with her or his Statistics advisor(s), usually once per week.
**The Data Analysis Exam****The Area of Strength Requirement****A relevant Dissertation, preceded by a Thesis Proposal**

During the first half of their first semester in the program, Ph.D. students learn from faculty about **Advanced Data Analysis (ADA)** projects that they could pursue, starting in the second half of their first semester. The ADA project is done in collaboration with an investigator from outside the Department, under the guidance of a faculty committee.

All projects are subject to the following guidelines:

The culmination of the project is a written document describing the work, along with a presentation to the Department. The project advisors are responsible for determining when the written document is sufficient for passing. The presentation should be 25 minutes in length, with additional time allotted for questions and answers.

At the conclusion of each Spring Semester the Department administers the Data Analysis Exam, which is designed to test students’ ability to apply statistical methods to address a substantive, real problem. Students are given eight hours to complete the exam, during which time they analyze the data and write a report to present their analysis and conclusions. The faculty are realistic as to what can be accomplished during the eight-hour period. In grading the exam, the faculty are looking for clear presentation of an appropriate analysis of the data. Emphasis is not placed on technical or mathematical sophistication. The exam is largely built on the content of 36-707, and hence should be taken in the Spring following the completion of that course.

While students are required to meet a minimum standard of performance in all of their coursework, successfully completing a dissertation in Statistics requires that a student possess some relevant dimension in which their skills far exceed this minimum. Therefore, before a student can begin the process leading to the dissertation proposal, the student needs to demonstrate an **area of strength**. Examples of areas of strength include Theoretical Statistics, Applied Statistics, and Computational Statistics. There are multiple ways that a student can satisfy this requirement, including strong performance in coursework or on a research or data analysis project. Students who have not demonstrated an area of strength will not be permitted to propose. Failure to establish an area of strength by the end of the fourth semester in the program may result in the student considered to not be in good standing.

The faculty exhibit flexibility and fairness in the application of this policy. The motivation is to ensure, to the extent possible, that the student will successfully complete her or his dissertation. The policy also recognizes the range of strengths and interests of our students, and that the discipline of Statistics needs researchers from across this spectrum. The area of strength is determined by the Statistics faculty and will be communicated to the student via one of the progress update letters that are sent at the end of each semester.

The thesis proposal in our Department is a critical opportunity for the faculty to provide constructive feedback to guide and shape the dissertation research. The proposal process succeeds when it leads the student to a sound and detailed plan for the dissertation. The faculty should provide the student with constructive criticism on proposed methods and approaches, force the student to question assumptions, and challenge the student’s perspective on the problem.

**Note:** The Department does not have qualifying or preliminary exams.

The following template shows the typical schedule a student follows to, in under two years, complete the coursework requirements and commence thesis research.

**Year One**

**Fall Semester**

- 36-699: Immigration to Statistics
- 36-705: Intermediate Statistics
- 36-707: Regression Analysis
- 36-750: Statistical Computing

**Spring Semester**

- 36-709: Advanced Statistics I
- 36-757: Advanced Data Analysis I
- 36-708: Statistical Machine Learning or Methods Minis
- Begin Work on ADA Project

**Year Two**

**Fall Semester**

- 36-795: Advanced Data Analysis II (Interdisciplinary Applied Research)
- Complete oral and written presentations of ADA Project
- Electives, e.g. 36-710: Advanced Statistical Theory / Probability Theory

**Spring Semester**

- Finalize Thesis Advisor and Topic
- Begin Elective Coursework*

**Year Three** is spent preparing and delivering the thesis proposal.

**Years four and beyond** are dedicated to dissertation research.

***Note:** After the first three semesters, our students often take elective courses on advanced statistical, machine learning, or domain-specific topics. A variety of half-semester courses (“minis”) are offered each semester that cover exciting topics in the field.