Deviation from these requirements will only be allowed in exceptional cases, and with the approval of the Director of Graduate Studies. With a few exceptions, students in joint degree programs are also required to satisfy all of these requirements.
The core Ph.D. course requirements are as follows:
Students are introduced to the faculty and their interests, the field of statistics, and the facilities at Carnegie Mellon. Each faculty member gives at least one elementary lecture on some topic of his or her choice. In the past, topics have included: the field of statistics and its history, large-scale sample surveys, survival analysis, subjective probability, time series, robustness, multivariate analysis, psychiatric statistics, experimental design, consulting, decision-making, probability models, statistics and the law, and comparative inference.
This course covers the fundamentals of theoretical statistics. Topics include: probability inequalities, point and interval estimation, minimax theory, hypothesis testing, data reduction, convergence concepts, Bayesian inference, nonparametric statistics, bootstrap resampling, VC dimension, prediction and model selection.
This is a course in data analysis using multiple linear regression. Topics covered include simple linear regression, ordinary least squares and weighted least squares, the geometry of least squares, quadratic forms, F tests and ANOVA tables, residuals, outlier detection, and identification of influential observations, variable selection methods, and modern regression techniques. Essential background in linear algebra is reviewed where necessary. When time permits other topics such as nonlinear regression and robust estimation will be discussed. Practice in data analysis is obtained through course projects.
This course is an introduction to modern methods in machine learning from a statistical perspective. Topics include: nonparametric regression, classification (linear classifiers, nearest neighbors, random forests, boosting, support vector machines), graphical models, clustering, EM, PCA, causality, deep learning, optimal transport.
This course is a one-semester overview of topics in Probability Theory. After a brief introduction to measure and integration theory, the focus will be on issues of immediate use to statisticians, such as modes of convergence, limit theorems, laws of large numbers, martingales, and other topics as time allows.
This course will cover a selection of modern topics in mathematical statistics, with a focus on high-dimensional and non-parametric statistical models. One of the main goals of this course is to provide the students with some theoretical background and mathematical tools to read and understand the current statistical literature on high-dimensional models. Among the topics covered: concentration inequalities, covariance estimation, PCA, penalized linear regression, network models, maximal inequalities, local Rademacher complexities.
A detailed introduction to elements of computing relating to statistical modeling, targeted to advanced undergraduates, masters students, and doctoral students in Statistics. Topics include important data structures and algorithms; numerical methods; databases; parallelism and concurrency; and coding practices, program design, and testing. Multiple programming languages will be supported (e.g., C, R, Python, etc.). Those with no previous programming experience are welcome but will be required to learn the basics of at least one language via self-study.
Advanced Data Analysis (ADA) is a Ph.D. level seminar on advanced methods in statistics, including computationally intensive smoothing, classification, variable selection and simulation techniques. During 36-757, you work with the seminar instructor to identify an ADA project for yourself. The ADA project is an extended project in applied statistics, done in collaboration with an investigator from outside the Department, under the guidance of a faculty committee, culminating in a publishable paper that is presented orally and in writing in 36-758. While 36-757 is a standard course with weekly class meetings, 36-758 is a research project, and only entails regular meetings with the ADA advisor(s). NB: While the 36-758 course requirements are satisfied by a successful oral presentation and written paper, the ADA project may not necessarily be completed at the end of the semester. The completion of the ADA project as a whole is at the discretion of the ADA advisor, and may extend up to one additional semester following 36-758.
Students are also strongly encouraged to take courses in machine learning, 10-701 and/or 10/36-702.
All Ph.D. students are also expected to successfully complete the following:
During their first semester in the program, Ph.D. students learn from faculty about Advanced Data Analysis (ADA) projects that they could pursue, starting in the second semester. The ADA project is done in collaboration with an investigator from outside the Department, under the guidance of a faculty committee.
ADA projects come from a wide range of applied disciplines, and originate both from within and outside CMU. All projects are subject to the following guidelines:
The culmination of the project is a written document describing the work, along with a presentation to the Department. The project advisors, along with the instructor for ADA II, are responsible for determining when the written document is sufficient for passing. The presentation should be 25 minutes in length, with additional time allotted for questions and answers.
At the conclusion of each Spring Semester the Department administers the Data Analysis Exam, which is designed to test students’ ability to apply statistical methods to address a substantive, real problem. Students are given eight hours to complete the exam, during which time they analyze the data and write a report to present their analysis and conclusions. The faculty are realistic as to what can be accomplished during the eight-hour period. In grading the exam, the faculty are looking for clear presentation of an appropriate analysis of the data. Emphasis is not placed on technical or mathematical sophistication. The exam is largely built on the content of 36-707, and hence should be taken in the Spring following the completion of that course.
While students are required to meet a minimum standard of performance in all of their coursework, successfully completing a dissertation in Statistics requires that a student possess some relevant dimension in which their skills far exceed this minimum. Therefore, before a student can begin the process leading to the dissertation proposal, the student needs to demonstrate an area of strength. Examples of areas of strength include Theoretical Statistics, Applied Statistics, and Computational Statistics. There are multiple ways that a student can satisfy this requirement, including strong performance in coursework or on a research or data analysis project. Students who have not demonstrated an area of strength will not be permitted to propose. Failure to establish an area of strength by the end of the fourth semester in the program may result in the student considered to not be in good standing.
The faculty exhibit flexibility and fairness in the application of this policy. The motivation is to ensure, to the extent possible, that the student will successfully complete her or his dissertation. The policy also recognizes the range of strengths and interests of our students, and that the discipline of Statistics needs researchers from across this spectrum. The area of strength is determined by the Statistics faculty and will be communicated to the student via one of the progress update letters that are sent at the end of each semester.
The thesis proposal in our Department is a critical opportunity for the faculty to provide constructive feedback to guide and shape the dissertation research. The proposal process succeeds when it leads the student to a sound and detailed plan for the dissertation. The faculty should provide the student with constructive criticism on proposed methods and approaches, force the student to question assumptions, and challenge the student’s perspective on the problem.
Note: The Department does not have qualifying or preliminary exams.
The following template shows the typical schedule a student follows to, in under two years, complete the coursework requirements and commence thesis research.
Year Three is spent preparing and delivering the thesis proposal.
Years four and beyond are dedicated to dissertation research.
*Note: After the first three semesters, our students often take elective courses on advanced statistical, machine learning, or domain-specific topics. A variety of half-semester courses (“minis”) are offered each semester that cover exciting topics in the field.