# First case: linear specification is wrong

Make up sample data; itâ€™ll simplify things later if theyâ€™re sorted for plotting

``x <- sort(runif(300, 0, 3))``

Impose true regression function \(\log{(x+1)}\), with Gaussian noise:

``yg <- log(x+1) + rnorm(length(x), 0, 0.15)``

Bind into a data frame:

``gframe <- data.frame(x=x, y=yg)``

Plot it, plus the true regression curve

``````plot(y~x, data=gframe, xlab="x", ylab="y", pch=16, cex=0.5)

Fit the linear model, add to the plot:

``````glinfit <- lm(y~x, data=gframe)
print(summary(glinfit), signif.stars=FALSE, digits=2)``````
``````##
## Call:
## lm(formula = y ~ x, data = gframe)
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -0.447 -0.111 -0.002  0.091  0.437
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    0.192      0.019      10   <2e-16
## x              0.431      0.010      41   <2e-16
##
## Residual standard error: 0.15 on 298 degrees of freedom
## Multiple R-squared:  0.85,   Adjusted R-squared:  0.85
## F-statistic: 1.7e+03 on 1 and 298 DF,  p-value: <2e-16``````
``````plot(x, yg, xlab="x", ylab="y")
abline(glinfit, lwd=4)``````

MSE of linear model, in-sample: 0.0238.

Weâ€™ll need to do that a lot, so make it a function:

``mse.residuals <- function(model) { mean(residuals(model)^2) }``
• EXERCISE: Write comments giving the inputs and outputs of the function
• EXERCISE: Check that the function works

Fit the non-parametric alternative:

``library(mgcv)``

Weâ€™ll use spline smoothing as provided by the `gam` function:

``gnpr <- gam(y~s(x),data=gframe)``

Add the fitted values from the spline to the plot:

``````plot(x,yg,xlab="x",ylab="y")