---
title: Simulation
author: 36-467/36-667
date: 1 November 2018
output: slidy_presentation
bibliography: locusts.bib
---
\[
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Prob}[1]{\mathbb{P}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]}
\]
```{r, include=FALSE}
# General set-up options
library(knitr)
opts_chunk$set(size="small", background="white",
cache=TRUE, autodep=TRUE,
tidy=FALSE, warning=FALSE, message=FALSE,
echo=TRUE)
```
## Previously
- Simulated an AR(1)
- Talked about how to simulate a spatial AR (Gibbs sampler)
- Haven't talked about how or why, in general
## Agenda
- What simulation is
- The general strategy
- Why we do it:
+ Understanding how the model behaves
+ Calculations ("the Monte Carlo method")
+ What-ifs?
## Simulation
- Originally: to _simulate_ is to make up something _similar_ to an original
- A model is a story or account of how the data could have been generated
- $\Rightarrow$ We should be able to step through that model and get something like the data
- Now: "simulate" means running the model to produce a **realization** or **trajectory**
## The general strategy
- Some variables are set _outside_ of the model; these are **exogenous**
+ Some are fixed (like the parameters)
+ Some are random (think $X_0$ in an AR(1))
- All others are **endogenous**
+ First generation: only depend on exogenous variables
+ Second generation: depend on exogenous and first-generation endogenous
+ Generation $k$: depend on up to generation $k-1$
- For variables in generation $k$
+ Write down the distribution in terms of earlier variables
+ Draw from this conditional distribution
+ Substitute in these random draws for all later variables
## The general strategy applied to AR(1)
\[
X(t) = \alpha + \beta X(t-1) + \epsilon(t)
\]
- Parameters $\alpha$ and $\beta$ are fixed
- $X(0)$ is exogeneous
- All $\epsilon$'s are exogenous
- $X(1)$ depends only on $X(0)$ and $\epsilon(1)$ (and parameters)
- $X(2)$ depends only on $X(1)$ and $\epsilon(2)$ (and parameters)
## It's easy to turn this into code {.smaller}
```{r}
# Simulate a time series of length n from a Gaussian AR(1) model with given
# coefficients, noise level and initial condition
# From solutions to HW 7, in turn based on Lecture 12 code
# Inputs: length of output time series (n)
# intercept of AR(1) (a)
# coefficient of AR(1) (b)
# standard deviation of the innovations (sd.innovation)
# initial value of time series (x.start)
# Output: vector of length n
# Presumes: innovations should be Gaussian white noise
# a, b, sd.innovation, x.start are all length-1 numerical values
ar1.sim <- function(n, a, b, sd.innovation, x.start) {
x <- vector(length=n)
x[1] <- x.start # Because R enumerates from 0
for (t in 1:(n-1)) {
x[t+1] <- a+b*x[t] + rnorm(1, sd=sd.innovation)
}
return(x)
}
```
## Exercise
Write out the code for a Gaussian AR(2)
- Make sure you provide all the needed inputs
## Exercise: Solution {.smaller}
```{r}
# Simulate a time series of length n from a Gaussian AR(2) model with given
# coefficients, noise level and initial condition
# From solutions to HW 7, in turn based on Lecture 12 code
# Inputs: length of output time series (n)
# intercept of AR(2) (a)
# vecotr of coefficients of AR(2) (b)
# standard deviation of the innovations (sd.innovation)
# initial two values of time series (x.start)
# Output: vector of length n
# Presumes: innovations should be Gaussian white noise
# a and sd.innovation are length-1 numerical values
# b and x.start are length-2 numerical vectors
# b has coefficient on X(t-1) first
# x.start has first value first
ar2.sim <- function(n, a, b, sd.innovation, x.start) {
x <- vector(length=n)
x[1] <- x.start[1] # Because R enumerates from 0
x[2] <- x.start[2]
for (t in 1:(n-2)) {
x[t+2] <- a+b[1]*x[t+1] + b[2]*x[t] + rnorm(1, sd=sd.innovation)
}
return(x)
}
```
For you to think through offline: How would you make this work for an AR$(p)$?
## "Math is hard; let's go simulate"
Three big reasons to simulate a model:
1. We can't easily see what it's going to do;
2. We can't easily calculate its exact behavior;
3. We can't easily see what would happen if something changed
## See what it's going to do
- You saw this in homework 7
- Sometimes we can work out what aspects of model's behavior by analyzing it mathematically
+ E.g., using eigenvalues and eigenvectors to understand AR and VAR models
- But sometimes that math is difficult, and/or we don't know how to do it properly
+ All the eigen-stuff only works for linear dynamics
- Then we can simulate and _see_
## Replacing calculations with simulations
- We want to know $\Expect{f(Y)}$ for some complicated function $f$ of the whole trajectory $Y$
+ If we can find $\Expect{f(Y)}$, we can find $\Prob{Y \in A} = \Expect{\mathbf{1}(Y \in A)}$ or $\Var{f(Y)} = \Expect{f^2(Y)} - \Expect{f(Y)}^2$, etc., etc.
- This is just an integral, but a _hard_ integral
+ Usually no analytical form
+ Numerical integration of $\int{f(y) p(y) dy}$ needs an exponential number of values of $y$ as $\mathrm{dim}(y)$ grows
## "The Monte Carlo method"
- Re-run the simulation $m$ times to get $Y_1, Y_2, \ldots Y_m$
- Calculate $f(Y_1), f(Y_2), \ldots f(Y_m)$
+ Means, variances, indicators for events, prediction errors, ...
- Then $m^{-1} \sum_{i=1}^{m}{f(Y_i)} \rightarrow \Expect{f(Y)}$
+ by law of large numbers over independent simulation runs
- In fact, with _independent_ simulation runs, central limit theorem says
\[
m^{-1} \sum_{i=1}^{m}{f(Y_i)} \rightsquigarrow \mathcal{N}(\Expect{f(Y)}, \Var{f(Y)}/m)
\]
+ $\mathrm{dim}(y)$ doesn't matter (directly) for accuracy, just $\Var{f(Y)}$
## The power of Monte Carlo
- Take two independent AR(1) processes, say $X(t)$ and $Z(t)$
- Calculate the sample correlation $\hat{\rho}$ between $X(1), \ldots X(n)$ and $Z(1), \ldots Z(n)$
- $\Expect{\hat{\rho}} = 0$ (Why?)
- What is $\Var{\hat{\rho}}$?
+ Answer: much, much bigger than you'd expect from independent $n$ independent random variables [@Yule-nonsense]
+ You can get an _exact_ value from heroic probability calculations and ninety years of dead ends [@Ernst-Shepp-Wyner-nonsense-correlation]
+ ... or get a very good approximate value in seconds with a few lines of R
##
What's the sample correlation between two _independent_ random walks of length 100?
```{r}
x <- ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1)
z <- ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1)
cor(x, z)
plot(x, ylab="X or Z", xlab="t", ylim=range(c(x,z)), pch=1)
points(1:length(z), z, pch=2)
legend("topleft", legend=c("X","Z"), pch=1:2)
```
##
We can do the simulation over (and get a different answer):
```{r}
cor(ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1),
ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1))
```
##
We can **replicate** the simulation many times, and look at the variance
of the answers:
```{r}
var(replicate(1000, cor(ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1),
ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1))))
```
We can change the length of each simulation:
```{r}
var(replicate(1000, cor(ar1.sim(n=1e4, a=0, b=1, x.start=0, sd.innovation=1),
ar1.sim(n=1e4, a=0, b=1, x.start=0, sd.innovation=1))))
```
## Why is this happening?
- The random walk is no more likely to rise than to fall
- but $\Var{X(t)} \propto t$, so $X(1), \ldots X(n)$ has either been _mostly_ rising or _mostly_ falling with high probability
- and the same with $Z(t)$
## What good is the math then?
- Simulation can _suggest_ that the variance is constant in $n$, but not _prove_ it
+ Often a good idea to simulate first, to see what math you should do!
- Short-cut: once we have a direct way to solve the problem, we don't need to re-run the simulation
+ The same math can apply to many situations, where the simulation approach would need to be re-run
+ But if we change assumptions, we need to re-do the math (hard), not just run new simulations (easy)
## What if?
- Change starting values
+ Or impose arbitrary changes later, after the start
- Change parameters
- Change assumptions
+ Say $t$-distributed noise rather than Gaussian
## Summary
- A model's specification determines its behavior, but we're too dumb to just read it off
- Simulation lets us _see_ what the model does
- Monte Carlo lets us _calculate_ what the model does from simulation
## Backup: The origin of Monte Carlo
- Using simulations to approximate integrals goes back at least to Buffon in the 18th century, but that was a one-off
- Scattered 19th century efforts
- Modern Monte Carlo goes back to Los Alamos and the bombs:
+ 1935: Enrico Fermi starts simulating neutron-nuclei interactions with pencil, paper, and random number tables [@Schwartz-on-Fermi, p. 124]
+ 1940s: Needing to calculate behavior of chain reactions in atom bombs, Fermi designs a [mechanical simulator to trace over 2D blueprints](https://www.lanl.gov/museum/discover/_docs/fermiac.pdf)
+ 1946: Stanislaw Ulam realizes that you could do simulations on the new digital "electronic computing machines", develops the idea with John von Neumann under the code name of "the Monte Carlo method" (suggested by Nicholas Metropolis)
+ 1946--1953: Rapid development by Los Alamos scientists, culminating in @Metropolis-et-al-Monte-Carlo
- Thereafter spreads from Los Alamos, RAND, etc.
+ Starts showing up in statistics in the 1960s
+ Not widespread until the 1980s
## Backup: Stanislaw Ulam holding the Fermiac {.smaller}
![](https://upload.wikimedia.org/wikipedia/commons/8/8a/STAN_ULAM_HOLDING_THE_FERMIAC.jpg)
[via](https://commons.wikimedia.org/wiki/File:STAN_ULAM_HOLDING_THE_FERMIAC.jpg)
## References