Advanced Data Analysis from an Elementary Point of View
This is a draft textbook on data analysis methods, intended for a
one-semester course for advance undergraduate students who have already taken
classes in probability, mathematical statistics, and linear regression. It
began as the lecture notes for 36-402 at Carnegie Mellon
University.
By making this draft generally available, I am not promising to provide any
assistance or even clarification whatsoever. Comments are, however, welcome.
The book is under contract to Cambridge
University Press; it should be turned over to the press at the end of 2013
or beginning of 2014. A copy of the next-to-final version will remain freely
accessible here permanently.
Complete draft in PDF
Table of contents:
I. Regression and Its Generalizations
- Regression Basics
- The Truth about Linear Regression
- Model Evaluation
- Smoothing in Regression
- Simulation
- The Bootstrap
- Weighting and Variance
- Splines
- Additive Models
- Testing Regression Specifications
- More about Hypothesis Testing
- Logistic Regression
- Generalized Linear Models and Generalized Additive Models
II. Multivariate Data, Distribution Estimates, and Latent Structure
- Multivariate Distributions
- Density Estimation
- Relative Distributions and Smooth Tests
- Principal Components Analysis
- Factor Analysis
- Mixture Models
- Graphical Models
III. Causal Inference
- Graphical Causal Models
- Identifying Causal Effects
- Estimating Causal Effects
- Discovering Causal Structure
IV. Dependent Data
- Time Series
- Time Series with Latent Variables
- Longitudinal, Spatial and Network Data
Appendices
- A. Writing R Functions
- B. Big O and Little o Notation
- C. chi-squared and the Likelihood Ratio Test
- D. Proof of the Gauss-Markov Theorem
- E. Constrained and Penalized Optimization
- F. Rudimentary Graph Theory
- G. Pseudo-code for the SGS Algorithm
(Text last updated 15 February 2013; webpage last updated 7 January 2013)