Advanced Data Analysis from an Elementary Point of View

by Cosma Rohilla Shalizi

This is a draft textbook on data analysis methods, intended for a one-semester course for advance undergraduate students who have already taken classes in probability, mathematical statistics, and linear regression. It began as the lecture notes for 36-402 at Carnegie Mellon University.

By making this draft generally available, I am not promising to provide any assistance or even clarification whatsoever. Comments are, however, generally welcome.

The book is under contract to Cambridge University Press; it should be turned over to the press at the end of 2013 or beginning of 2014 in early before the end of 2015 by the end of 2018, inshallah. A copy of the next-to-final version will remain freely accessible here permanently.

Complete draft in PDF

Table of contents:

    I. Regression and Its Generalizations
  1. Regression Basics
  2. The Truth about Linear Regression
  3. Model Evaluation
  4. Smoothing in Regression
  5. Simulation
  6. The Bootstrap
  7. Splines
  8. Additive Models
  9. Testing Regression Specifications
  10. Weighting and Variance
  11. Logistic Regression
  12. Generalized Linear Models and Generalized Additive Models
  13. Classification and Regression Trees
    II. Distributions and Latent Structure
  14. Density Estimation
  15. Relative Distributions and Smooth Tests of Goodness-of-Fit
  16. Principal Components Analysis
  17. Factor Models
  18. Nonlinear Dimensionality Reduction
  19. Mixture Models
  20. Graphical Models
    III. Causal Inference
  21. Graphical Causal Models
  22. Identifying Causal Effects
  23. Causal Inference from Experiments
  24. Estimating Causal Effects
  25. Discovering Causal Structure
    IV. Dependent Data
  26. Time Series
  27. Simulation-Based Inference

Planned changes:

(Text last updated 9 September 2018; this page last updated 10 September 2018)