# Trends and Smoothing II

4 September 2018

$\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\dof}{DoF} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}}$

# In our last episode…

• Data $$X(t) = \TrueRegFunc(t) + \TrueNoise(t)$$
• $$\TrueRegFunc$$ deterministic (=trend), $$\TrueNoise$$ stochastic and mean-zero (=fluctuations)
• Wanted: estimates of $$\TrueRegFunc$$ and/or $$\TrueNoise$$ from one data set
• Hope: $$\TrueRegFunc$$ is a smooth function $$\Rightarrow$$ average nearby $$X$$’s
• Linear smoother: $$\EstRegFunc(t) = \sum_{j=1}^{n}{w(t, t_j) x_j}$$
• Fitted values on the data $$\mathbf{\EstRegFunc} = \mathbf{w}\mathbf{x}$$
• $$\mathbf{w}$$ is the source of all knowledge

# Expectation of the fitted values

$\begin{eqnarray} \Expect{\mathbf{\EstRegFunc}} & = & \Expect{\mathbf{w}\mathbf{X}}\\ & = & \mathbf{w}\Expect{\mathbf{X}}\\ & = & \mathbf{w} \mathbf{\mu} \end{eqnarray}$

Unbiased estimate $$\Leftrightarrow \mathbf{w} \mathbf{\mu} = \mathbf{\mu}$$

# Expanding in eigenvectors

• Generally, $$\mathbf{w}$$ has $$n$$ linearly-independent eigenvectors $$\mathbf{e}_1, \ldots \mathbf{e}_n$$, with eigenvalues $$\lambda_1, \ldots \lambda_n$$
• So $$\mathbf{x} = \sum_{j=1}^{n}{c_j \mathbf{e}_j}$$
• So $$\mathbf{w}\mathbf{x} = \mathbf{w}\sum_{j=1}^{n}{c_j \mathbf{e}_j} = \sum_{j=1}^{n}{c_j \lambda_j \mathbf{e}_j}$$
• Components of the data which match large-$$\lambda$$ eigenvectors are enhanced
• Components of the data which match small-$$\lambda$$ eigenvectors are shrunk

# A little example

n <- 10
w <- matrix(0, nrow=10, ncol=10)
diag(w) <- 1/3
for (i in 2:(n-1)) {
w[i,i+1] <- 1/3
w[i,i-1] <- 1/3
}
w[1,1] <- 1/2
w[1,2] <- 1/2
w[n,n-1] <- 1/2
w[n,n] <- 1/2

# A little example

w
##            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]
##  [1,] 0.5000000 0.5000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [2,] 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000
##  [3,] 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000
##  [4,] 0.0000000 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000
##  [5,] 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333 0.3333333
##  [6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
##  [7,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333
##  [8,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [9,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [10,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##            [,7]      [,8]      [,9]     [,10]
##  [1,] 0.0000000 0.0000000 0.0000000 0.0000000
##  [2,] 0.0000000 0.0000000 0.0000000 0.0000000
##  [3,] 0.0000000 0.0000000 0.0000000 0.0000000
##  [4,] 0.0000000 0.0000000 0.0000000 0.0000000
##  [5,] 0.0000000 0.0000000 0.0000000 0.0000000
##  [6,] 0.3333333 0.0000000 0.0000000 0.0000000
##  [7,] 0.3333333 0.3333333 0.0000000 0.0000000
##  [8,] 0.3333333 0.3333333 0.3333333 0.0000000
##  [9,] 0.0000000 0.3333333 0.3333333 0.3333333
## [10,] 0.0000000 0.0000000 0.5000000 0.5000000

# A little example

eigen(w)$values ## [1] 1.00000000 0.96261129 0.85490143 0.68968376 0.48651845 ## [6] -0.31012390 0.26920019 -0.23729622 -0.11137134 0.06254301 eigen(w)$vectors[,1]
##  [1] 0.3162278 0.3162278 0.3162278 0.3162278 0.3162278 0.3162278 0.3162278
##  [8] 0.3162278 0.3162278 0.3162278

# Variance of the fitted values

$\begin{eqnarray} \Var{\mathbf{\EstRegFunc}} & = & \Var{\mathbf{w}\mathbf{X}}\\ & = & \mathbf{w}\Var{\mathbf{X}}\mathbf{w}^T\\ & = & \mathbf{w}\Var{\mathbf{\TrueRegFunc} + \mathbf{\TrueNoise}}\mathbf{w}^T\\ & = & \mathbf{w}\Var{\mathbf{\TrueNoise}}\mathbf{w}^T \end{eqnarray}$

IF $$\Var{\mathbf{\TrueNoise}} = \sigma^2 \mathbf{I}$$, THEN $$\Var{\mathbf{\EstRegFunc}} = \sigma^2\mathbf{w}\mathbf{w}^T$$

# How much do the fitted values respond to the data?

$\begin{eqnarray} \sum_{i=1}^{n}{\Cov{\EstRegFunc_i, X_i}} & = & \sum_{i=1}^{n}{\Cov{\sum_{j=1}^{n}{w_{ij} X_j}, X_i}}\\ & = & \sum_{i=1}^{n}{\sum_{j=1}^{n}{w_{ij} \Cov{X_i, X_j}}}\\ & = & \sum_{i=1}^{n}{\sum_{j=1}^{n}{w_{ij} \Cov{\TrueNoise_i, \TrueNoise_j}}} \end{eqnarray}$

IF $$\Var{\mathbf{\TrueNoise}} = \sigma^2 \mathbf{I}$$, THEN this $$= \sigma^2\tr{\mathbf{w}} = \sigma^2 \text{(sum of eigenvalues)}$$

$$\tr{\mathbf{w}} =$$ (effective) degrees of freedom

# Data = trend + fluctuation

• $$X(t) = \TrueRegFunc(t) + \TrueNoise(t)$$
• $$\Rightarrow$$ $$\TrueNoise(t) = X(t) - \TrueRegFunc(t)$$
• $$\Rightarrow$$ $$\EstNoise(t) \equiv X(t) - \EstRegFunc(t) =$$ residuals
$\begin{eqnarray} \mathbf{\EstNoise} & = & \mathbf{x} - \mathbf{\EstRegFunc}\\ & = & \mathbf{x} - \mathbf{w}\mathbf{x}\\ & = & (\mathbf{I} - \mathbf{w})\mathbf{x} \end{eqnarray}$

Convince yourself: $$\mathbf{I}-\mathbf{w}$$ has same eigenvectors as $$\mathbf{w}$$, but eigenvalues $$1-\lambda$$

# Expected residuals

$\begin{eqnarray} \Expect{\mathbf{\EstNoise}} & = & \Expect{(\mathbf{I}-\mathbf{w})\mathbf{X}}\\ & = & (\mathbf{I}-\mathbf{w})\mathbf{\TrueRegFunc} \end{eqnarray}$

Biased trend estimate $$\Leftrightarrow$$ biased fluctuation estimate

# Break for the in-class exercise

• $$X(t) = \TrueRegFunc(t) + \TrueNoise(t)$$, and $$\Var{\TrueNoise(t)} = \sigma^2$$, $$\Cov{\TrueNoise(t_1), \TrueNoise(t_2)} = 0$$
• Set $$\EstRegFunc(t) = \frac{1}{3}\sum_{s=t-1}^{t+1}{X(s)}$$
• Ignore the ends of the data where we don’t have neighbors on both sides
• What is $$\Cov{\EstRegFunc(t), \EstRegFunc(t+1)}$$?
• What is $$\Cov{\EstRegFunc(t), \EstRegFunc(t+2)}$$?
• What is $$\Cov{\EstRegFunc(t), \EstRegFunc(t+3)}$$?
• Why aren’t all of these 0?

# Variance and covariance of the residuals

$\Var{\mathbf{\EstNoise}} = (\mathbf{I}-\mathbf{w}) \Var{\mathbf{\epsilon}} (\mathbf{I}-\mathbf{w})^T$

IF $$\Var{\mathbf{\epsilon}} = \sigma^2 \mathbf{I}$$, THEN this $$= \sigma^2 (\mathbf{I}-\mathbf{w})(\mathbf{I}-\mathbf{w})^T$$

NB: Correlations from off-diagonal entries in $$\mathbf{w}$$

# Splines

$\EstRegFunc = \argmin_{m}{\frac{1}{n}\sum_{i=1}^{n}{(x_i - m(t_i))^2} + \lambda\int{(m^{\prime\prime}(t))^2 dt}}$

• This $$\lambda$$ not an eigenvalue (sorry)
• Fit the data points vs. over-all curvature
• Minimization is over all functions
• Solution is always a piecewise cubic polynomial, but continuous, with continuous 1st and 2nd derivatives
• $$\lambda \rightarrow 0$$ $$\Rightarrow$$ Straight lines between data points
• $$\lambda \rightarrow \infty$$ $$\Rightarrow$$ Global linear fit
• $$\downarrow$$ degrees of freedom as $$\uparrow \lambda$$

# How do we pick $$\lambda$$?

• Want trend to predict not-yet-seen stuff (interpolate, extrapolate, filter)
• A good $$\lambda$$ predicts new stuff well
• Hold out part of the data and try to predict that from the rest

# Leave-one-out cross-validation (LOOCV)

• For each of the $$n$$ data points:
• Fit using every data point except $$i$$, get $$\EstRegFunc^{(-i)}$$;
• Find $$\EstRegFunc^{(-i)}(t_i)$$;
• Find $$(x_i - \EstRegFunc^{(-i)}(t_i))^2$$.
• Average over all data points, $$n^{-1}\sum_{i=1}^{n}{(x_i - \EstRegFunc^{(-i)}(t_i))^2}$$

• Low LOOCV $$\Leftrightarrow$$ good ability to predict new data
• This is what smooth.spline does automatically

# Leave-one-out cross-validation (LOOCV)

Don’t have to re-fit linear smoothers $$n$$ times

$\begin{eqnarray} \EstRegFunc^{(-i)}(t_i) &= & \frac{({\mathbf{w} \mathbf{x})}_i - w_{ii} x_i}{1-w_{ii}}\\ x_i - \EstRegFunc^{(-i)}(t_i) & = & \frac{x_i - \EstRegFunc(t_i)}{1-w_{ii}}\\ LOOCV & = & \frac{1}{n}\sum_{i=1}^{n}{\left(\frac{x_i-\EstRegFunc(t_i)}{1-w_{ii}}\right)^2} \end{eqnarray}$

# Many variants

• $$h$$-block CV: omit a buffer of radius $$h$$ around the hold-out point from the training set
• $$k$$- or $$v$$-fold CV: divide data into $$k$$ equal-sized “folds”, try to predict each fold using the rest of the data
• $$hv$$-block CV: $$v$$-fold with a buffer
• etc., et.

# The moral

• Never care about how good the in-sample fit is ($$R^2$$, $$R^2_{adj}$$, etc.)
• Always care about ability to predict new data

# Summing up

• If the trend is smooth, we can estimate it by smoothing
• Every smoother is biased towards some patterns and against others
• Properties of the fitted values come from the weights
• Fluctuations are residuals after removing a trend
• De-trending can create correlations
• We decide how to smooth by cross-validation