36-467/36-667

1 November 2018

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \]

- Simulated an AR(1)
- Talked about how to simulate a spatial AR (Gibbs sampler)
- Haven’t talked about how or why, in general

- What simulation is
- The general strategy
- Why we do it:
- Understanding how the model behaves
- Calculations (“the Monte Carlo method”)
- What-ifs?

- Originally: to
*simulate*is to make up something*similar*to an original - A model is a story or account of how the data could have been generated
- \(\Rightarrow\) We should be able to step through that model and get something like the data
- Now: “simulate” means running the model to produce a
**realization**or**trajectory**

- Some variables are set
*outside*of the model; these are**exogenous**- Some are fixed (like the parameters)
- Some are random (think \(X_0\) in an AR(1))

- All others are
**endogenous**- First generation: only depend on exogenous variables
- Second generation: depend on exogenous and first-generation endogenous
- Generation \(k\): depend on up to generation \(k-1\)

- For variables in generation \(k\)
- Write down the distribution in terms of earlier variables
- Draw from this conditional distribution
- Substitute in these random draws for all later variables

\[ X(t) = \alpha + \beta X(t-1) + \epsilon(t) \]

- Parameters \(\alpha\) and \(\beta\) are fixed
- \(X(0)\) is exogeneous
- All \(\epsilon\)’s are exogenous
- \(X(1)\) depends only on \(X(0)\) and \(\epsilon(1)\) (and parameters)
- \(X(2)\) depends only on \(X(1)\) and \(\epsilon(2)\) (and parameters)

```
# Simulate a time series of length n from a Gaussian AR(1) model with given
# coefficients, noise level and initial condition
# From solutions to HW 7, in turn based on Lecture 12 code
# Inputs: length of output time series (n)
# intercept of AR(1) (a)
# coefficient of AR(1) (b)
# standard deviation of the innovations (sd.innovation)
# initial value of time series (x.start)
# Output: vector of length n
# Presumes: innovations should be Gaussian white noise
# a, b, sd.innovation, x.start are all length-1 numerical values
ar1.sim <- function(n, a, b, sd.innovation, x.start) {
x <- vector(length=n)
x[1] <- x.start # Because R enumerates from 0
for (t in 1:(n-1)) {
x[t+1] <- a+b*x[t] + rnorm(1, sd=sd.innovation)
}
return(x)
}
```

Write out the code for a Gaussian AR(2)

- Make sure you provide all the needed inputs

```
# Simulate a time series of length n from a Gaussian AR(2) model with given
# coefficients, noise level and initial condition
# From solutions to HW 7, in turn based on Lecture 12 code
# Inputs: length of output time series (n)
# intercept of AR(2) (a)
# vecotr of coefficients of AR(2) (b)
# standard deviation of the innovations (sd.innovation)
# initial two values of time series (x.start)
# Output: vector of length n
# Presumes: innovations should be Gaussian white noise
# a and sd.innovation are length-1 numerical values
# b and x.start are length-2 numerical vectors
# b has coefficient on X(t-1) first
# x.start has first value first
ar2.sim <- function(n, a, b, sd.innovation, x.start) {
x <- vector(length=n)
x[1] <- x.start[1] # Because R enumerates from 0
x[2] <- x.start[2]
for (t in 1:(n-2)) {
x[t+2] <- a+b[1]*x[t+1] + b[2]*x[t] + rnorm(1, sd=sd.innovation)
}
return(x)
}
```

For you to think through offline: How would you make this work for an AR\((p)\)?

Three big reasons to simulate a model:

- We can’t easily see what it’s going to do;
- We can’t easily calculate its exact behavior;
- We can’t easily see what would happen if something changed

- You saw this in homework 7
- Sometimes we can work out what aspects of model’s behavior by analyzing it mathematically
- E.g., using eigenvalues and eigenvectors to understand AR and VAR models

- But sometimes that math is difficult, and/or we don’t know how to do it properly
- All the eigen-stuff only works for linear dynamics

- Then we can simulate and
*see*

- We want to know \(\Expect{f(Y)}\) for some complicated function \(f\) of the whole trajectory \(Y\)
- If we can find \(\Expect{f(Y)}\), we can find \(\Prob{Y \in A} = \Expect{\mathbf{1}(Y \in A)}\) or \(\Var{f(Y)} = \Expect{f^2(Y)} - \Expect{f(Y)}^2\), etc., etc.

- This is just an integral, but a
*hard*integral- Usually no analytical form
- Numerical integration of \(\int{f(y) p(y) dy}\) needs an exponential number of values of \(y\) as \(\mathrm{dim}(y)\) grows

- Re-run the simulation \(m\) times to get \(Y_1, Y_2, \ldots Y_m\)
- Calculate \(f(Y_1), f(Y_2), \ldots f(Y_m)\)
- Means, variances, indicators for events, prediction errors, …

- Then \(m^{-1} \sum_{i=1}^{m}{f(Y_i)} \rightarrow \Expect{f(Y)}\)
- by law of large numbers over independent simulation runs

- In fact, with
*independent*simulation runs, central limit theorem says \[ m^{-1} \sum_{i=1}^{m}{f(Y_i)} \rightsquigarrow \mathcal{N}(\Expect{f(Y)}, \Var{f(Y)}/m) \]- \(\mathrm{dim}(y)\) doesn’t matter (directly) for accuracy, just \(\Var{f(Y)}\)

- Take two independent AR(1) processes, say \(X(t)\) and \(Z(t)\)
- Calculate the sample correlation \(\hat{\rho}\) between \(X(1), \ldots X(n)\) and \(Z(1), \ldots Z(n)\)
- \(\Expect{\hat{\rho}} = 0\) (Why?)
- What is \(\Var{\hat{\rho}}\)?
- Answer: much, much bigger than you’d expect from independent \(n\) independent random variables (Yule 1926)
- You can get an
*exact*value from heroic probability calculations and ninety years of dead ends (Ernst, Shepp, and Wyner 2017) - … or get a very good approximate value in seconds with a few lines of R

What’s the sample correlation between two *independent* random walks of length 100?

```
x <- ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1)
z <- ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1)
cor(x, z)
```

`## [1] 0.7201643`

We can do the simulation over (and get a different answer):

```
cor(ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1),
ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1))
```

`## [1] 0.1754642`

We can **replicate** the simulation many times, and look at the variance of the answers:

```
var(replicate(1000, cor(ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1),
ar1.sim(n=100, a=0, b=1, x.start=0, sd.innovation=1))))
```

`## [1] 0.2404867`

We can change the length of each simulation:

```
var(replicate(1000, cor(ar1.sim(n=1e4, a=0, b=1, x.start=0, sd.innovation=1),
ar1.sim(n=1e4, a=0, b=1, x.start=0, sd.innovation=1))))
```

`## [1] 0.2423016`

- The random walk is no more likely to rise than to fall
- but \(\Var{X(t)} \propto t\), so \(X(1), \ldots X(n)\) has either been
*mostly*rising or*mostly*falling with high probability - and the same with \(Z(t)\)

- Simulation can
*suggest*that the variance is constant in \(n\), but not*prove*it- Often a good idea to simulate first, to see what math you should do!

- Short-cut: once we have a direct way to solve the problem, we don’t need to re-run the simulation
- The same math can apply to many situations, where the simulation approach would need to be re-run
- But if we change assumptions, we need to re-do the math (hard), not just run new simulations (easy)

- Change starting values
- Or impose arbitrary changes later, after the start

- Change parameters
- Change assumptions
- Say \(t\)-distributed noise rather than Gaussian

- A model’s specification determines its behavior, but we’re too dumb to just read it off
- Simulation lets us
*see*what the model does - Monte Carlo lets us
*calculate*what the model does from simulation

- Using simulations to approximate integrals goes back at least to Buffon in the 18th century, but that was a one-off
- Scattered 19th century efforts
- Modern Monte Carlo goes back to Los Alamos and the bombs:
- 1935: Enrico Fermi starts simulating neutron-nuclei interactions with pencil, paper, and random number tables (Schwartz 2017, 124)
- 1940s: Needing to calculate behavior of chain reactions in atom bombs, Fermi designs a mechanical simulator to trace over 2D blueprints
- 1946: Stanislaw Ulam realizes that you could do simulations on the new digital “electronic computing machines”, develops the idea with John von Neumann under the code name of “the Monte Carlo method” (suggested by Nicholas Metropolis)
- 1946–1953: Rapid development by Los Alamos scientists, culminating in Metropolis et al. (1953)

- Thereafter spreads from Los Alamos, RAND, etc.
- Starts showing up in statistics in the 1960s
- Not widespread until the 1980s

Ernst, Philip A., Larry A. Shepp, and Abraham J. Wyner. 2017. “Yule’s ‘Nonsense Correlation’ Solved!” *Annals of Statistics* 45:1789–1809. https://doi.org/10.1214/16-AOS1509.

Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. “Equations of State Calculations by Fast Computing Machines.” *Journal of Chemical Physics* 21:1087–92. https://doi.org/10.1063/1.1699114.

Schwartz, David N. 2017. *The Last Man Who Knew Everything: The Life and Times of Enrico Fermi, Father of the Nuclear Age*. New York: Basic Books.

Yule, G. Udny. 1926. “Why Do We Sometimes Get Nonsense-Correlations Between Time-Series? — a Study in Sampling and the Nature of Time-Series.” *Journal of the Royal Statistical Society* 89:1–63. https://doi.org/10.2307/2341482.