36-402, Section A

5 February 2019

\[ \newcommand{\Expect}[1]{\mathbf{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathrm{Pr}\left( #1 \right)} \newcommand{\Probwrt}[2]{\mathrm{Pr}_{#2}\left( #1 \right)} \]

- Knowing the sampling distribution of a statistic tells us about statistical uncertainty (standard errors, biases, confidence sets)
- The bootstrap principle:
*approximate*the sampling distribution by*simulating*from a good model of the data, and treating the simulated data just like the real data - Sometimes we simulate from the model we’re estimating (model-based or “parametric” bootstrap)
- Sometimes we simulate by re-sampling the original data (resampling or “nonparametric” bootstrap)
- Stronger assumptions \(\Rightarrow\) less uncertainty
*if we’re right*

- Re-run the experiment (survey, census, …) and get different data
- \(\therefore\) everything we calculate from data (estimates, test statistics, \(p\)-values, policy recommendations, …) will change from run to run
- This variability is (the source of)
**statistical uncertainty** - Quantifying this = honesty about what we actually know

- Standard error = standard deviation of an estimator
- (could equally well use median absolute deviation, etc.)

- \(p\)-value = Probability we’d see a signal this big if there was just noise
- Confidence region = All the parameter values we can’t reject at low error rates
**Either**the true parameter is in the confidence region**or**we are very unlucky**or**our model is wrong

- etc., etc.

- Data \(X \sim P_X\) for some unknown true distribution \(P_X\)
- We calculate a statistic \(T = \tau(X)\) so it has distribution \(P_{T}\)
- If we knew \(P_{T}\), we could calculate
- \(\Var{T}\) (and so standard error)
- \(\Expect{T}\) (and so bias)
- quantiles (and so confidence intervals or \(p\)-values), etc.

- Difficulty 1: Most of the time, \(P_{X}\) is very complicated
- Difficulty 2: Most of the time, \(\tau\) is a very complicated function
- \(\therefore\) We couldn’t solve for \(P_T\)
- Difficulty 3: Actually, we don’t know \(P_X\)
- Upshot: We
*really*don’t know \(P_{T}\) and can’t use it to calculate anything

Classically (\(\approx 1900\)–\(\approx 1975\)): Restrict the model and the statistic until you can calculate the sampling distribution, at least for very large \(n\)

Modern (\(\approx 1975\)–): Use complex models and statistics, but simulate calculating the statistic on the model

- Generate a simulate \(\tilde{X}\) from \(P_X\)
- Set \(\tilde{T} = \tau(\tilde{X})\)
- Repeat many times
- Use the simulated distribution of the \(\tilde{T}\) to approximate \(P_{T}\)
- (As a general method, invented by Enrico Fermi in the 1930s, spread through the Manhattan Project)

- Still needs \(P_X\)
- Works in HW 3 because we’re testing a fixed model

- Find a good estimate \(\hat{P}\) for \(P_{X}\)
- Generate a simulation \(\tilde{X}\) from \(\hat{P}\), set \(\tilde{T} = \tau(\tilde{X})\)
- Use the simulated distribution of the \(\tilde{T}\) to approximate \(P_{T}\)
- “Pull yourself up by your bootstraps”: use \(\hat{P}\) to get at uncertainty in itself
- Invented by Bradley Efron in the 1970s

- First step: find a good estimate \(\hat{P}\) for \(P_{X}\)

If we are using a model, our best guess at \(P_{X}\) is \(P_{X,\hat{\theta}}\), with our best estimate \(\hat{\theta}\) of the parameters

- Get data \(X\), estimate \(\hat{\theta}\) from \(X\)
- Repeat \(b\) times:
- Simulate \(\tilde{X}\) from \(P_{X,\hat{\theta}}\) (simulate data of same size/“shape” as real data)
- Calculate \(\tilde{T} = \tau(\tilde{X}\)) (treat simulated data the same as real data)

- Use empirical distribution of \(\tilde{T}\) as \(P_{T}\)