STAT3002 Week 4
4.1 Weighted Least Squares
Strong Interaction of Elementary Particles. See Weisberg pages 83-88.
plab |
x |
y |
sd |
4 |
0.345 |
367 |
17 |
6 |
0.287 |
311 |
9 |
8 |
0.251 |
295 |
9 |
10 |
0.225 |
268 |
7 |
12 |
0.207 |
253 |
7 |
15 |
0.186 |
239 |
6 |
20 |
0.161 |
220 |
6 |
30 |
0.132 |
213 |
6 |
75 |
0.084 |
193 |
5 |
150 |
0.060 |
192 |
5 |
> attach(particles)
> w <- 1/particles$sd^2
> y1.lm <- lm(y~x,weights=w)
> summary(y1.lm)
Call: lm(formula = y ~ x, weights = w)
Residuals:
Min 1Q Median
3Q Max
-2.323 -0.8842 1.266e-006 1.39 2.335
Coefficients:
Value Std. Error
t value Pr(>|t|)
(Intercept) 148.4732 8.0786
18.3785 0.0000
x 530.8354 47.5500 11.1637
0.0000
Residual standard error: 1.657 on 8 degrees
of freedom
Multiple R-Squared: 0.9397
F-statistic: 124.6 on 1 and 8 degrees of
freedom, the p-value is 3.71e-006
Correlation of Coefficients:
(Intercept)
x -0.905
> anova(y1.lm)
Analysis of Variance Table
Response: y
Terms added sequentially (first to last)
Df Sum of Sq Mean Sq F Value Pr(F)
x 1 341.9914 341.9914 124.6287 3.710432e-006
Residuals
8 21.9526 2.7441
4.2 Testing for Lack of Fit, Variance Known
For the above data set, we have σ2=1, so we can test for lack of fit with known population variance.
For the linear regression, there is very
strong evidence, p-value = 0.005, that the linear regression is not adequate.
> pchisq(21.95,8)
[1] 0.9949907
> y2.lm <- lm(y~x+x^2,weight=w)
> summary(y2.lm)
Call: lm(formula = y ~ x + x^2, weights = w)
Residuals:
Min 1Q Median
3Q Max
-0.8993 -0.4351 0.01374 0.38 1.142
Coefficients:
Value Std. Error
t value Pr(>|t|)
(Intercept)
183.8305 6.4591 28.4609
0.0000
x 0.9709 85.3688
0.0114 0.9912
I(x^2) 1597.5047 250.5869 6.3751
0.0004
Residual standard error: 0.6788 on 7 degrees
of freedom
Multiple R-Squared: 0.9911
F-statistic: 391.4 on 2 and 7 degrees of
freedom, the p-value is 6.554e-008
Correlation of Coefficients:
(Intercept) x
x
-0.9419
I(x^2)
0.8587 -0.9736
> anova(y2.lm)
Analysis of Variance Table
Response: y
Terms added sequentially (first to last)
Df Sum of Sq Mean Sq F Value Pr(F)
x 1 341.9914 341.9914 742.1846 0.0000000230
I(x^2) 1 18.7271
18.7271 40.6413 0.0003761159
Residuals
7 3.2255 0.4608
For the quadratic regression we will accept H0, p-value = 0.8629, that the quadratic regression is adequate and the linear regression is not.
> pchisq(3.23,7)
[1] 0.1370585
4.3 Testing for Lack of Fit, Variance Unknown
Viscosity Data Set.
Temp |
Viscosity |
Temp |
Viscosity |
1 |
10.84 |
5 |
22.25 |
1 |
9.30 |
5.6 |
27.20 |
2 |
16.35 |
5.6 |
25.90 |
3.3 |
22.88 |
5.6 |
25.61 |
3.3 |
24.35 |
6 |
25.45 |
4 |
24.56 |
6 |
26.56 |
4 |
25.86 |
6.5 |
21.03 |
4 |
29.16 |
6.9 |
21.46 |
4.7 |
24.59 |
|
|
> visc.lm <- lm(Viscosity~Temp)
> summary(visc.lm)
Call: lm(formula = Viscosity ~ Temp)
Residuals:
Min 1Q Median 3Q Max
-6.454 -1.616 0.5638 2.636 7.425
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 13.2139 2.6649 4.9586 0.0002
Temp 2.1304 0.5645
3.7737 0.0018
Residual standard error: 4.084 on 15 degrees
of freedom
Multiple R-Squared: 0.487
F-statistic: 14.24 on 1 and 15 degrees of
freedom, the p-value is 0.001839
Correlation of Coefficients:
(Intercept)
Temp -0.9284
> anova(visc.lm)
Analysis of Variance Table
Response: Viscosity
Terms added sequentially (first to last)
Df Sum of Sq Mean Sq F
Value Pr(F)
Temp 1 237.4788 237.4788 14.2411 0.001839409
Residuals 15
250.1338 16.6756
> out.numeric <- lm(Viscosity~Temp)
> out.factor <-
lm(Viscosity~as.factor(Temp))
>
anova(out.numeric,out.factor,test="F")
Analysis of Variance Table
Response: Viscosity
Terms Resid. Df RSS Test Df Sum of Sq F Value Pr(F)
1
Temp 15 250.1338
2 as.factor(Temp) 7 15.5630 1 vs.
2 8
234.5708 13.18827 0.001388715
For the viscosity data set, the is overwhelming evidence, p-value = 0.0014, that the model is false, a more complex relationship between temperature and viscosity is needed.
4.4 Comparing Regression Lines
Twin Data. The data give the IQ scores of identical twins, one raised in a foster home (Y), and the other raised by natural parents (X). The data were originally used by Professor C. Burt (British J. Psychology, 1996, pp. 137-153).
> attach(twin)
> plot(X,Y,type="n")
> points(X[G1==1],Y[G1==1],pch=1)
> points(X[G2==1],Y[G2==1],pch=2)
> points(X[G3==1],Y[G3==1],pch=3)
> legend(locator(1),c("Highest Class","Middle Class","Lowest Class"),marks=c(1,2,3))
Model 1: Different slopes and intercepts.
> twin.m1 <- lm(Y~G1+G2+G3+Z1+Z2+Z3-1)
> summary(twin.m1)
Call: lm(formula = Y ~ G1 + G2 + G3 + Z1 + Z2 + Z3
- 1)
Residuals:
Min 1Q Median 3Q Max
-14.48
-5.248 -0.155 4.582 13.8
Coefficients:
Value
Std. Error t value Pr(>|t|)
G1
-1.8720 17.8083 -0.1051
0.9173
G2
0.8160 26.1092 0.0313
0.9754
G3
7.2046 16.7513 0.4301
0.6715
Z1
0.9776 0.1632 5.9902 0.0000
Z2
0.9726 0.2863 3.3973
0.0027
Z3
0.9484 0.1822 5.2061
0.0000
Residual standard error: 7.921 on 21 degrees of
freedom
Multiple R-Squared: 0.9948
F-statistic: 663.2 on 6 and 21 degrees of freedom,
the p-value is 0
Correlation of Coefficients:
G1 G2 G3
Z1 Z2
G2
0.0000
G3
0.0000 0.0000
Z1 -0.9858
0.0000 0.0000
Z2 0.0000
-0.9923 0.0000 0.0000
Z3 0.0000 0.0000 -0.9920 0.0000 0.0000
> anova(twin.m1)
Analysis of Variance Table
Response: Y
Terms added sequentially (first to last)
Df
Sum of Sq Mean Sq F Value Pr(F)
G1 1 74675.6 74675.6 1190.301
0.000000000
G2 1 47348.2 47348.2 754.712 0.000000000
G3 1 122953.1 122953.1 1959.828 0.000000000
Z1 1 2251.2 2251.2 35.883 0.000006042
Z2 1 724.1 724.1 11.542 0.002715346
Z3 1 1700.4 1700.4 27.104 0.000036901
Residuals 21 1317.5 62.7
> twin.m2 <- lm(Y~G1+G2+G3+X-1)
> summary(twin.m2)
Call: lm(formula = Y ~ G1 + G2 + G3 + X - 1)
Residuals:
Min 1Q
Median 3Q Max
-14.82
-5.237 -0.1111 4.476 13.7
Coefficients:
Value
Std. Error t value Pr(>|t|)
G1
-0.6076 11.8551 -0.0513
0.9596
G2
1.4277 10.1604 0.1405
0.8895
G3
5.6188 9.9628 0.5640
0.5782
X 0.9658
0.1069 9.0306 0.0000
Residual standard error: 7.571 on 23 degrees of
freedom
Multiple R-Squared: 0.9947
F-statistic: 1089 on 4 and 23 degrees of freedom,
the p-value is 0
Correlation of Coefficients:
G1 G2 G3
G2
0.9244
G3
0.9502 0.9328
X -0.9704
-0.9526 -0.9792
> anova(twin.m2)
Analysis of Variance Table
Response: Y
Terms added sequentially (first to last)
Df
Sum of Sq Mean Sq F Value Pr(F)
G1 1 74675.6 74675.6 1302.742
0.000000e+000
G2 1 47348.2 47348.2 826.005 0.000000e+000
G3 1 122953.1 122953.1 2144.961 0.000000e+000
X 1 4674.7 4674.7 81.552 5.047447e-009
Residuals 23 1318.4 57.3
> twin.m3 <- lm(Y~Z1+Z2+Z3)
> summary(twin.m3)
Call: lm(formula = Y ~ Z1 + Z2 + Z3)
Residuals:
Min 1Q
Median 3Q Max
-15.4
-5.188 -0.05831 4.596 13.58
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept)
2.5623 10.5984 0.2418 0.8111
Z1 0.9375 0.0993
9.4423 0.0000
Z2 0.9536 0.1202
7.9319 0.0000
Z3 0.9985 0.1164
8.5747 0.0000
Residual standard error: 7.594 on 23 degrees of
freedom
Multiple R-Squared: 0.8027
F-statistic: 31.2 on 3 and 23 degrees of freedom,
the p-value is 2.791e-008
Correlation of Coefficients:
(Intercept) Z1 Z2
Z1 -0.9643
Z2 -0.9592
0.9249
Z3 -0.9819
0.9468 0.9418
> anova(twin.m3)
Analysis of Variance Table
Response: Y
Terms added sequentially (first to last)
Df
Sum of Sq Mean Sq F Value
Pr(F)
Z1 1 1147.326 1147.326 19.89392 0.0001787
Z2 1 10.545 10.545 0.18284 0.6729232
Z3 1 4240.337 4240.337 73.52483 0.0000000
Residuals 23 1326.460 57.672
> twin.m4 <- lm(Y~X)
> summary(twin.m4)
Call: lm(formula = Y ~ X)
Residuals:
Min 1Q
Median 3Q Max
-11.35
-5.731 0.05742 4.324 16.35
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 9.2076 9.2999 0.9901 0.3316
X
0.9014 0.0963 9.3575 0.0000
Residual standard error: 7.729 on 25 degrees of
freedom
Multiple R-Squared: 0.7779
F-statistic: 87.56 on 1 and 25 degrees of freedom,
the p-value is 1.204e-009
Correlation of Coefficients:
(Intercept)
X -0.9871
> anova(twin.m4)
Analysis of Variance Table
Response: Y
Terms added sequentially (first to last)
Df
Sum of Sq Mean Sq F Value Pr(F)
X 1 5231.133 5231.133 87.56305 1.2036e-009
Residuals 25 1493.533 59.741