--- title: Linear Generative Models for Time Series date: 16 October 2018 author: 36-467/36-667 output: slidy_presentation bibliography: locusts.bib --- {r, include=FALSE} # General set-up options library(knitr) opts_chunk$set(size="small", background="white", highlight=FALSE, cache=TRUE, autodep=TRUE, tidy=TRUE, tidy.opts=list(comment=FALSE), warning=FALSE, message=FALSE, echo=FALSE)  $\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]}$ ## Previously - Aimed to be as close to descriptive and exploratory as possible + Very weak assumptions (like stationarity) + Or no assumptions (as in PCA) - Advantages: + Security / robustness / reliability: less can go wrong + Still able to say _something_ - Drawbacks: + Weak on inferences beyond the observables + Weak on uncertainty - Going forward: + Generative models (= distributions over the whole data) + Uncertainty in inference ## Linear autoregressions - The **first-order linear autoregression**, or **AR(1)**: \begin{eqnarray} X(t) & = & a + b X(t-1) + \epsilon(t)\\ X(0) & = & \text{some random variable or other} \end{eqnarray} - The **innovations**$\epsilon(t)$are + All expectation 0 + All uncorrelated with each other + All uncorrelated with$X(0)$+ Typically constant-variance + To fully **specify** the model, need to give the distributions of$X(0)$and$\epsilon$+ In nice situations, all Gaussian and IID ## Generating a new time series - Draw$X(0)$from its distribution - Draw$\epsilon(1)$from its distribution and set$X(1) \leftarrow a+bX(0)+\epsilon(1)$- Iterate: + Draw$\epsilon(t)$from its distribution + Set$X(t) \leftarrow a+bX(t-1)+\epsilon(t)$## Unroll the AR(1) a little \begin{eqnarray} X(t) & = & a + b X(t-1) + \epsilon(t)\\ & = & a + \epsilon(t) + b(a+b X(t-2) + \epsilon(t-1))\\ & = & a + ba + b^2 a + \ldots + b^{t-1} a + \epsilon(t) + b\epsilon(t-1) + b^2 \epsilon(t-2) + \ldots + b^{t-1} \epsilon(1) + b^t X(0)\\ & = & a\sum_{k=0}^{t-1}{b^k} + \sum_{k=0}^{t}{\epsilon(t-k) b^k} + b^t X(0) \end{eqnarray} > - At each time, get a random ($\epsilon(t)$) plus a deterministic ($a$) kick, whose impact is multiplied by$b$at each subsequent time step, forever > - (**infinite impulse response**) ## Think about the deterministic version Set$a=0$to simplify book-keeping \begin{eqnarray} x(t) & = & b x(t-1)\\ & = & \text{???} \end{eqnarray} **In-class exercise 1**: Write$x(t)$in terms of$b$,$t$, and$x(0)$## Think about the deterministic version Set$a=0$to simplify book-keeping \begin{eqnarray} x(t) & = & b x(t-1)\\ & = & b^t x(0) \end{eqnarray} If$|b|<1$then$b^t \rightarrow 0$as$t\rightarrow \infty$So if$|b| < 1$then$x(t) \rightarrow 0$First-order dynamics are exponential decay to 0 or growth to$\pm \infty$## Adding the noise back in Constantly being perturbed away from the deterministic path {r} set.seed(2018-10-12) # Trick learned from Thomas Lumley to make random draws reproducible x <- vector(length=11) n <- length(x)-1 x[1] <- 1.5 b <- 0.8 for (t in 2:(n+1)) { x[t] <- b*x[t-1] + rnorm(1, sd=sqrt(0.1)) } plot(0:n, x, type="l", xlab="t", ylab="X(t)", main="AR(1)", sub="b=0.8, noise variance=0.1") for (t in 0:(n-1)) { lines(t:n, x[t+1]*(b^(0:(n-t))), lty="dashed") } legend("topright", legend=c("Stochastic", "Deterministic"), lty=c("solid", "dashed"))  ## How would we predict? Intuitively: $\hat{X}(t+k) = b^{k} x(t)$ Rigorously: \begin{eqnarray} \Expect{X(t+k)|X(t)=x} & = & \Expect{b^k X(t) + \epsilon(t+k) + b \epsilon(t+k-1) + \ldots + b^{k-1}\epsilon(t)|X(t) = x}\\ & = & b^k x + 0 \end{eqnarray} because innovations are uncorrelated ## What about covariances? \begin{eqnarray} \Cov{X(t+h), X(t)} & = & \Cov{b^h X(t) + \epsilon(t+h) + b\epsilon(t+h-1) + \ldots + b^{h-1}\epsilon(t), X(t)}\\ & = & b^h \Cov{X(t), X(t)} + 0\\ & = & b^h \Var{X(t)} \end{eqnarray}$\Rightarrow$if$\Var{X(t)}$and$\Expect{X(t)}$are constant, this is stationary ## Higher-order autoregressions $X(t) = a + b_1 X(t-1) + b_2 X(t-2) + \ldots b_p X(t-p) + \epsilon(t)$ - Same rules about innovations - Same idea about how to generate - Same rules for prediction:$\Expect{X(t)|X(t-1)=x_1, \ldots X(t-p)=x_p} = a + b_1 x_1 + b_2 x_2 + \ldots b_p x_p$## What about multiple variables? **Vector autoregression of order 1**, or **VAR(1)** $\vec{X}(t) = \vec{a} + \mathbf{b} \vec{X}(t-1) + \vec{\epsilon}(t)$$\vec{X}(t) =$random vector of dimension$p$, the **state** at time$t\vec{a} =$deterministic vector of dimension$p\mathbf{b} =$deterministic matrix of dimension$p\times p\vec{\epsilon}(t) =$random vector of dimension$p$, the **innovation** ## What about multiple variables? Zero out the offset$\vec{a}$for now $\vec{X}(t) = \mathbf{b} \vec{X}(t-1) + \vec{\epsilon}(t)$ What are the deterministic dynamics? ## Linear dynamical systems in multiple dimensions $\vec{x}(t) = \mathbf{b}\vec{x}(t-1)$ - Suppose the eigenvectors of$\mathbf{b}$form a basis - Then for coefficients$c_1, \ldots c_p$, $\vec{x}(0) = \sum_{j=1}^{p}{c_j \vec{v}_j}$ ## Linear dynamical systems in multiple dimensions - Dynamics are just multiplying: \begin{eqnarray} \vec{x}(t) & = & \mathbf{b}\vec{x}(t-1)\\ & = & \mathbf{b}^t \vec{x}(0) \end{eqnarray} - In-class exercise: Write$\vec{x}(t)$in terms of the$c_j$, the$\lambda_j$and$\vec{v}_j$## Linear dynamical systems in multiple dimensions - Dynamics are just multiplying: \begin{eqnarray} \vec{x}(t) & = & \mathbf{b}\vec{x}(t-1)\\ & = & \mathbf{b}^t \vec{x}(0)\\ & = & \sum_{j=1}^{p}{ \lambda^t_j c_j \vec{v}_j} \end{eqnarray} ## Eigenvalues determine the dynamics of a linear system The easy case: all eigenvalues$\lambda_1, \ldots \lambda_p$are real -$\lambda_j > 1$: grow along that direction -$0 \leq \lambda_j < 1$: shrink along that direction towards the origin$\vec{0}$-$\lambda_j < -1$: flip around the origin, grow in that direction -$-1 < \lambda_j \leq 0$: flip around the origin and shrink ## Eigenvalues determine the dynamics of a linear system - Some eigenvalues can be complex + The corresponding coefficients are _also_ complex - These always come in complex-conjugate pairs + so the coefficients are complex-conjugate pairs - The formula$\vec{x}(t) = \sum_{j=1}^{p}{\lambda^t_j c_j \vec{v}_j}$still works + the imaginary parts always cancel exactly - Complex eigenvalues$\Leftrightarrow$rotation ## Rotation with complex eigenvalues {r} b <- matrix(c(0.99, 0.01, -0.01, 0.99), byrow=TRUE, nrow=2)  {r, echo=TRUE} b eigen(b) Mod(eigen(b)$values)  ## Rotation with complex eigenvalues {r, echo=TRUE} (x <- matrix(c(1,2), nrow=2)) b %*% x b %*% b %*% x  ## Rotation with complex eigenvalues {r} n <- 400 x.series <- matrix(0, ncol=n+1, nrow=2) x.series[,1] <- x for (t in 1:n) { x.series[,t+1] <- b%*%x.series[,t] } plot(0:n, x.series[1,], type="l", ylim=c(-2.5, 2.5), xlab="t", ylab="x(t)") lines(0:n, x.series[2,], type="l", lty="dashed") curve(Mod(eigen(b)$values[1])^x, lty="dotted", add=TRUE) legend("top", legend=c(expression(x[1]), expression(x[2]), expression(abs(lambda[1])^t)), lty=c("solid", "dashed", "dotted"))  ## Rotation with complex eigenvalues Reset$\mathbf{b}^{\prime} = \mathbf{b}/|\lambda_1|$so now both eigenvalues have modulus 1 {r} b <- b/Mod(eigen(b)$values[1]) n <- 400 x.series <- matrix(0, ncol=n+1, nrow=2) x.series[,1] <- x for (t in 1:n) { x.series[,t+1] <- b%*%x.series[,t] } plot(0:n, x.series[1,], type="l", ylim=c(-2.5, 2.5), xlab="t", ylab="x(t)") lines(0:n, x.series[2,], type="l", lty="dashed") legend("topright", legend=c(expression(x[1]), expression(x[2])), lty=c("solid", "dashed"))  ## What's going on here? - Having both $x_1(t-1)$ and $x_2(t-1)$ is like having $x(t-1)$ _and_ $x(t-2)$ - Higher-order ARs are like VARs with extra variables to keep track of past states ## Morals on linear, deterministic dynamical systems - Find the eigenvalues and eigenvectors + $|\lambda_j| < 1 \Rightarrow$ exponential decay along $\vec{v}_j$ + $|\lambda_j| > 1 \Rightarrow$ exponential growth + $|\lambda_j| = 1 \Rightarrow$ eternal recurrence + $\mathrm{Im}(\lambda_j) \neq 0 \Rightarrow$ rotations in the space spanned by those eigenvectors - Higher-order dependence on the past $\Rightarrow$ first-order dependence with extra memory variables ## Adding on noise - If the eigenvalues are less than 1, $\mathbf{b}\vec{X}(t-1)$ has less variance than $\vec{X}(t-1)$ - The innovation contributes some extra variance - To be stationary, the shrinkage has to exactly balance the new variance ## Summarizing > - Autoregressive models are linear dynamical systems plus noise > - Linear dynamics are governed by the eigenvalues and eigenvectors of the linear operator > - The noise perturbs away from the exact linear dynamics > - Stationarity means that the perturbations have to counter-balance the dynamics > - What I haven't told you: how to estimate these models ## Backup: VAR(1) vs. 2nd-order dynamics Remember that sine and cosine waves obey $\frac{d^2 x}{dt^2}(t) = - \omega^2 x(t)$ Say $x_1(t)$ is a sine wave, and _define_ $x_2(t) = dx_1/dt$ $\left[ \begin{array}{c} \frac{dx_1}{dt} \\ \frac{dx_2}{dt} \end{array}\right] = \left[ \begin{array}{c} x_2 \\ -x_1 \end{array}\right] = \left[\begin{array}{cc} 0 & 1 \\ -1 & 0\end{array}\right] \left[\begin{array}{c} x_1 \\ x_2 \end{array}\right]$ ## Backup: VAR(1) vs. 2nd-order dynamics Fix a small time increment $h$ $\left[ \begin{array}{c} \frac{x_1(t) - x_1(t-h)}{h} \\ \frac{x_2(t)-x_2(t-h)}{h} \end{array}\right] = \left[ \begin{array}{c} x_2(t-h) \\ -x_1(t-h) \end{array}\right]$ ## Backup: VAR(1) vs. 2nd-order dynamics $\left[ \begin{array}{c} x_1(t) - x_1(t-h) \\ x_2(t)-x_2(t-h) \end{array}\right] = \left[ \begin{array}{c} hx_2(t-h) \\ -hx_1(t-h) \end{array}\right]$ ## Backup: VAR(1) vs. 2nd-order dynamics $\left[ \begin{array}{c} x_1(t) \\ x_2(t) \end{array}\right] = \left[ \begin{array}{c} x_1(t-h) + hx_2(t) \\ x_2(t-h) -hx_1(t) \end{array}\right]$ ## Backup: VAR(1) vs. 2nd-order dynamics $\left[ \begin{array}{c} x_1(t) \\ x_2(t) \end{array}\right] = \left[ \begin{array}{cc} 1 & h \\ -h & 1 \end{array}\right] \left[\begin{array}{c} x_1(t-h) \\ x_2(t-h) \end{array}\right]$ ## Backup: VAR(1) vs. 2nd-order dynamics - The extra variable $x_2$ helps $x_1$ keep track of where it is in its cycle + Higher-order dynamics = first-order dynamics with extra variables - Tweaking to have $|\lambda| = 1$ gives an exact cycle + Complex eigenvalues off the unit = cycle $\times$ exponential growth or decay ## Backup: the intercept and re-centering - In 1D: if $x(t) = a + bx(t-1)$ and $y(t)=x(t) - \frac{a}{b-1}$, then $y(t) = by(t-1)$ \begin{eqnarray} x(t) & = & a+bx(t-1) ~ \text{(assumption)}\\ x(t) - c & = & b(x(t-1) - c) ~\text{(asserted to find right} ~ c)\\ -c & = & -a -bc ~\text{(subtract 2nd line from 1st)}\\ (b-1)c & = & -a ~ \text{(re-arrange)} \end{eqnarray} - In multiple dimensions: \begin{eqnarray} \vec{x}(t) & = & \vec{a} + \mathbf{b}\vec{x}(t-1)\\ \vec{x}(t) - \vec{c} & = & \mathbf{b}(\vec{x}(t-1)-\vec{c})\\ (\mathbf{I} - \mathbf{b})\vec{c} & = & \vec{a} \end{eqnarray} + Always has a solution $\vec{c}$ if $(\mathbf{I}-\mathbf{b})$ is invertible + _or_ if $\vec{a}$ is in the range space of $\mathbf{b}$