Bootstrap

36-402, Section A

5 February 2019

The Big Picture

\[ \newcommand{\Expect}[1]{\mathbf{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathrm{Pr}\left( #1 \right)} \newcommand{\Probwrt}[2]{\mathrm{Pr}_{#2}\left( #1 \right)} \]

  1. Knowing the sampling distribution of a statistic tells us about statistical uncertainty (standard errors, biases, confidence sets)
  2. The bootstrap principle: approximate the sampling distribution by simulating from a good model of the data, and treating the simulated data just like the real data
  3. Sometimes we simulate from the model we’re estimating (model-based or “parametric” bootstrap)
  4. Sometimes we simulate by re-sampling the original data (resampling or “nonparametric” bootstrap)
  5. Stronger assumptions \(\Rightarrow\) less uncertainty if we’re right

Statistical Uncertainty

Measures of Uncertainty

The Sampling Distribution Is the Source of All Knowledge

The Difficulties

The Solutions

The Monte Carlo Principle

The Bootstrap Principle

  1. Find a good estimate \(\hat{P}\) for \(P_{X}\)
  2. Generate a simulation \(\tilde{X}\) from \(\hat{P}\), set \(\tilde{T} = \tau(\tilde{X})\)
  3. Use the simulated distribution of the \(\tilde{T}\) to approximate \(P_{T}\)
    • “Pull yourself up by your bootstraps”: use \(\hat{P}\) to get at uncertainty in itself
    • Invented by Bradley Efron in the 1970s

Model-based Bootstrap

If we are using a model, our best guess at \(P_{X}\) is \(P_{X,\hat{\theta}}\), with our best estimate \(\hat{\theta}\) of the parameters

The Model-based Bootstrap

Example: Is Karakedi overweight?