# Advanced Data Analysis from an Elementary Point of View

## byCosma Rohilla Shalizi

This is a draft textbook on data analysis methods, intended for a one-semester course for advance undergraduate students who have already taken classes in probability, mathematical statistics, and linear regression. It began as the lecture notes for 36-402 at Carnegie Mellon University.

By making this draft generally available, I am not promising to provide any assistance or even clarification whatsoever. Comments are, however, generally welcome.

The book is under contract to Cambridge University Press; it should be turned over to the press at the end of 2013 or beginning of 2014 in early before the end of 2015 by the end of 2018 2019, inshallah. A copy of the next-to-final version will remain freely accessible here permanently.

#### What you're probably looking for

I. Regression and Its Generalizations
1. Regression Basics
2. The Truth about Linear Regression
3. Model Evaluation
4. Smoothing in Regression
5. Simulation
6. The Bootstrap
7. Splines
9. Testing Regression Specifications
10. Weighting and Variance
11. Logistic Regression
12. Generalized Linear Models and Generalized Additive Models
13. Classification and Regression Trees
II. Distributions and Latent Structure
14. Density Estimation
15. Relative Distributions and Smooth Tests of Goodness-of-Fit
16. Principal Components Analysis
17. Factor Models
18. Nonlinear Dimensionality Reduction
19. Mixture Models
20. Graphical Models
III. Causal Inference
21. Graphical Causal Models
22. Identifying Causal Effects
23. Estimating Causal Effects
24. Discovering Causal Structure
IV. Dependent Data
25. Time Series
26. Simulation-Based Inference
Appendices
• Data-Analysis Problem Sets
• Reminders from Linear Algebra
• Big O and Little o Notation
• Taylor Expansions
• Multivariate Distributions
• Algebra with Expectations and Variances
• Propagation of Error, and Standard Errors for Derived Quantities
• Optimization
• chi-squared and the Likelihood Ratio Test
• Rudimentary Graph Theory
• Writing R Functions
• Random Variable Generation

#### Planned changes

• Expand treatment of partial identification for causal inference, including partial identification of effects by looking at all data-compatible DAGs (part IV)
• Figure out how to cut at least 50 pages
• Make sure notation is consistent throughout: insist that vectors are always matrices, or use more geometric notation?
• Move some appendices online (i.e., after references and problem sets)

(Text last updated 1 April 2019; this page last updated 21 March 2019)