# Advanced Data Analysis from an Elementary Point of View

This is a draft textbook on data analysis methods, intended for a
one-semester course for advance undergraduate students who have already taken
classes in probability, mathematical statistics, and linear regression. It
began as the lecture notes for 36-402 at Carnegie Mellon
University.

By making this draft generally available, I am not promising to provide any
assistance or even clarification whatsoever. Comments are,
however, *generally* welcome.

The book is under contract to Cambridge
University Press; it should be turned over to the press ~~at the end
of 2013 or beginning of 2014~~ ~~in early~~ ~~before
the end of 2015~~ by the end of ~~2018~~ 2019, inshallah.
A copy of the next-to-final version will remain freely accessible here
permanently.

#### What you're probably looking for

Complete draft in PDF

Directory of chapter-by-chapter R files for examples

Directory of data sets used in examples

#### Table of contents

I. Regression and Its Generalizations
- Regression Basics
- The Truth about Linear Regression
- Model Evaluation
- Smoothing in Regression
- Simulation
- The Bootstrap
- Splines
- Additive Models
- Testing Regression Specifications
- Weighting and Variance
- Logistic Regression
- Generalized Linear Models and Generalized Additive Models
- Classification and Regression Trees

II. Distributions and Latent Structure
- Density Estimation
- Principal Components Analysis
- Factor Models
- Mixture Models
- Graphical Models

III. Causal Inference
- Graphical Causal Models
- Identifying Causal Effects
- Estimating Causal Effects
- Discovering Causal Structure

IV. Dependent Data
- Time Series
- Simulation-Based Inference

Online-only Appendices
- Big O and Little o Notation
- Taylor Expansions
- Propagation of Error, and Standard Errors for Derived Quantities
- Optimization
- Relative Distributions and Smooth Tests of Goodness of Fit
- Nonlinear Dimensionality Reduction
- Rudimentary Graph Theory
- Missing Data
- Writing R Functions

Data-Analysis Assignments

#### Planned changes

- Remove redundant versions of the data-analysis assignments; provide solutions as a separate document through publisher
- Unified treatment of information theory as an appendix
- Improved treatment of nonparametric instrument variables
- Trim time-series chapter so it's less of a catalog of everything that might be useful
- Break out stuff on heuristic essential asymptotics as a separate appendix
- Make sure notation is consistent throughout: insist that vectors are
always matrices, or use more geometric notation?
- Figure out how to cut at least 50 pages

(Text last updated 8 September 2019; this page last updated 9 September 2019)