Finger Exercises to be discussed Wed Sept 25 2002 ------------------------------------------------- Consider the following data sets: A. The Longley data (Monahan, p. 86), which is highly collinear. * Splus: X <- longley.x y <- longley.y * R: data(longley) X <- longley[,1:6] y <- longley[,7] B. The Steel and Torre data (Monahan, p. 109 ex 5.2) X <- 1:6 y <- c(4.5, 5.5, 6.5, 8.0, 10.0, 12.0) ----------------------------------------------------------------------- 1. Compute the least-squares regression coefficients for the Longley and Steel/Torre datasets, using each of the following methods. Nb., don't forget to say X <- cbind(1,X)) to get the intercept in the model, before using methods (a), (b), (c). (a) Inverting XTX directly: XTX _ t(X) %*% X XTXinv _ solve(XTX) (b) Cholesky decomposition [use chol() to find the cholesky decomposition] (c) QR decomposition [use qr(), qr.Q() and qr.R() to find the QR decomposition] (d) lm in SPLUS or R [don't say X _ cbind(1,X) in this case] and compare the results. [There will not be too much difference because Splus does all its calculations in double precision, and typically shows only 7 or so significant decimal digits. How could you find differences in the lower-order digits? Single precision calculations might not be able to handle the longley data, for example] LONGLEY: ------- (a) XTXinv: The strategy is simple. From X'Xb = X'y (' = transpose) we simply want to write b = (X'X)^{-1} X'y. So we convert X'X, invert it, etc... > X _ cbind(1,X) > XTX _ t(X) %*% X > XTXinv _ solve(XTX) Problem in solve.qr(a): apparently singular matrix Use traceback() to see the call stack It didn't work! Apparently XTX is not (numerically) of full rank---Splus does not think it possesses an inverse! Since we will see below that other methods are able to produce a b-hat here, we have to ask what's wrong with inverting XTX? Although X has highly collinear columns, it is---mathematically---of full rank. Therefore XTX is mathematically full rank and has an inverse. However, computing XTX in machine floating point (even double precision) can introduce roundoff, truncation and underflow errors that make XTX "look" singular. In particular, > eigen(XTX)$values [1] 6.665301e+07 2.090730e+05 1.053550e+05 1.803976e+04 2.455730e+01 [6] 2.015117e+00 1.172178e-07 shows that the smallest eigenvalue of the computed XTX is nearly 0; and we could believe from this that XTX has only 6 nonzero eigenvalues. But then XTX is of rank 6, not 7, and doesn't have an inverse. An analogous check in obtaining the QR decomposition of XTX as a prelude to computing XTXinv makes the same conclusion, and stops cold. (b) Cholesky: Note: Monahan discusses the Cholesky decomposition A = LL' (' = transpose), but checking help(chol) in Splus or R shows that the decomposition that chol() produces is A = U'U, where U is upper-triangular. The solution strategy is the same though: (i) Change X'Xb = X'y into U'Ub = X'y (ii) Solve U'w = X'y for w (iii) Solve Ub = w for w For (ii) and (iii) you either have to program your own backsolving routine (like we did on the blackboard) or note the solve() and backsolve() routines referenced at the bottom of help(chol). We will use solve() here. > X _ cbind(1,X) > XTX _ t(X) %*% X > U _ chol(XTX) > w _ solve(t(U),t(X) %*% y) > b _ solve(U,w) > b [,1] -3.482259e+03 GNP deflator 1.506187e-02 GNP -3.581918e-02 Unemployed -2.020230e-02 Armed Forces -1.033227e-02 Population -5.110411e-02 Year 1.829151e+00 > round(b,7) [,1] -3482.2585593 GNP deflator 0.0150619 GNP -0.0358192 Unemployed -0.0202023 Armed Forces -0.0103323 Population -0.0511041 Year 1.8291514 So using this approach it seems that XTX is, effectively, invertible... Hmmm... (c) QR: Here the strategy is as follows: (i) Change X'Xb = X'y into R'Q'QRb = R'Q'y, i.e. R'Rb = R'Q'y (ii) Note that if Rb = Q'y then also R'Rb = R'Q'y, so only need to... (iii) ...Solve Rb = Q'y > X _ cbind(1,X) > X.qr _ qr(X) > Q _ qr.Q(X.qr) > R _ qr.R(X.qr) > b _ solve(R , t(Q) %*% y) > round(b,7) [,1] -3482.2586346 GNP deflator 0.0150619 GNP -0.0358192 Unemployed -0.0202023 Armed Forces -0.0103323 Population -0.0511041 Year 1.8291515 Note that there is a function qr.coef (referenced in help(qr)) that does this all automatically: > qr.coef(qr(X),y) GNP deflator GNP Unemployed Armed Forces Population -3482.259 0.01506187 -0.03581918 -0.0202023 -0.01033227 -0.05110411 (d) USING lm: > lm(y ~ X) Call: lm(formula = y ~ X) Coefficients: (Intercept) XGNP deflator XGNP XUnemployed XArmed Forces XPopulation -3482.259 0.01506187 -0.03581918 -0.0202023 -0.01033227 -0.05110411 XYear 1.829151 Degrees of freedom: 16 total; 9 residual Residual standard error: 0.3048541 STEEL/TORRE: ----------- (a) XTXinv: > X _ cbind(1,X) > XTX _ t(X) %*% X > XTXinv _ solve(XTX) > b _ XTXinv %*% t(X) %*% y (b) Cholesky: > X _ cbind(1,X) > XTX _ t(X) %*% X > U _ chol(XTX) > w _ solve(t(U),t(X) %*% y) > b _ solve(U,w) > round(b,7) [,1] 2.5 X 1.5 (c) QR: > X _ cbind(1,X) > X.qr _ qr(X) > Q _ qr.Q(X.qr) > R _ qr.R(X.qr) > b _ solve(R , t(Q) %*% y) > round(b,7) [,1] 2.5 X 1.5 (d) USING lm: > lm(y ~ X) Call: lm(formula = y ~ X) Coefficients: (Intercept) X 2.5 1.5 Degrees of freedom: 6 total; 4 residual Residual standard error: 0.5 [if you are using Splus, explore help(lm.fit) as well. This is the fitting function that lm uses. lm.fit in R is less versatile.] help(lm.fit) shows that lm.fit is a switcher among the various numerical fitting functions ( lm.fit.qr, lm.fit.chol, lm.fit.svd). The argument method does the switching: "qr" for lm.fit.qr, etc. help(lm.fit.qr) gives help on each of these three methods for solving the normal equations for linear regression. 2. As described on pp. 84--85 of Monahan, the condition number for a linear regression bounds relative changes in the estimated regression coefficients in terms of relative changes in the response variables Y, or in the predictors X, in a regression problem. Computing the exact condition number is not hard for small problems: If X is the matrix of regressors (without the intercept), the condition number may be computed as: XTX _ cbind(1,X)%*%cbind(1,X) v _ eigen(XTX)$values k _ sqrt(max(v)/min(v)) Two alternatives: XTX _ cbind(1,X)%*%cbind(1,X) v _ svd(XTX)$d k _ sqrt(max(v)/min(v)) v _ svd(cbind(1,X)) k _ max(v)/min(v) Argue that these are all the same (cf. Monahan p. 123). (There is also a function kappa() that computes a fast approximate condition number, but the exact calculation is usually fast enough.) 3. Compute exact (as in #2) and approximate (using kappa()) condition numbers for the following X's [don't forget to cbind with 1!]: (a) X _ rnorm(6) # cbind(1,X) full rank > X _ cbind(1,rnorm(6)) > v _ svd(X)$d > max(v)/min(v) [1] 1.445677 > kappa(X) [1] 1.322598 (b) X _ 1:6 # cbind(1,X) full rank, very mild collinearity > v _ svd(cbind(1,1:6))$d > max(v)/min(v) [1] 9.359386 > kappa(cbind(1,1:6)) [1] 11.37986 (c) X _ cbind(1:6,2:7) # cbind(1,X) not full rank > v _ svd(cbind(1,1:6,2:7))$d > max(v)/min(v) [1] Inf > v [1] 15.318538 1.158612 0.000000 Note: v has 3 columns but only two nonzero singular values -- i.e. it is of rank 2; So, it's not of full rank and the condition number is infinity! > kappa(cbind(1,1:6,2:7)) [1] 8.296808e+17 (d) X _ longley.x # cbind(1,X) full rank, highly collinear > X _ longley.x > v _ svd(cbind(1,X))$d > max(v)/min(v) [1] 23845862 > kappa(X) [1] 22219266 4. For 3(a), (b) and (c), use the y vector for the Steel/Torre data; for 3(d) use the y vector from the Longley data. For all four examples, compare the regression coefficients in lm(y ~ X) vs y1 _ y + rnorm(length(y)); lm(y1 ~ X) and relate to the results in #3. (a) Random X's > X _ X[,-1] # (starting from X _ cbind(1,rnorm(6)) in #3) > y _ c(4.5, 5.5, 6.5, 8.0, 10.0, 12.0) > y1 _ y + rnorm(length(y)) > lm(y ~ X)$coef (Intercept) X 7.776017 0.3681481 > lm(y1 ~ X)$coef (Intercept) X 7.486244 -0.2373551 We can directly compare this with the bound on the relative change in b-hat vs relative change in y-hat discussed in class: > b0 _ lm(y~X)$coef > b1 _ lm(y1~X)$coef > sqrt(sum((b1-b0)^2)/sum(b0^2)) [1] 0.08622899 > yh0 _ lm(y~X)$fitted > yh1 _ lm(y1~X)$fitted > sqrt(sum((yh1-yh0)^2)/sum(yh0^2)) [1] 0.0631556 > 0.08622899/0.0631556 [1] 1.365342 So, not too far off the condition number (1.446)... (b) STEEL/TORRE: (X _ 1:6) > X _ 1:6 > y _ c(4.5, 5.5, 6.5, 8.0, 10.0, 12.0) > y1 _ y + rnorm(length(y)) > lm(y ~ X)$coef (Intercept) X 2.5 1.5 > lm(y1 ~ X)$coef (Intercept) X 2.118812 1.622246 > b0 _ lm(y~X)$coef > b1 _ lm(y1~X)$coef > sqrt(sum((b1-b0)^2)/sum(b0^2)) [1] 0.1373052 > yh0 _ lm(y~X)$fitted > yh1 _ lm(y1~X)$fitted > sqrt(sum((yh1-yh0)^2)/sum(yh0^2)) [1] 0.02620884 > 0.1373052/0.02620884 [1] 5.238889 This is a little less than we would expect, from the condition number (9.359). Further experimentation suggests that the condition number is a little pessimistic in this case: > kap _ NULL > for (i in 1:100) { + y1 _ y + rnorm(length(y)) + b0 _ lm(y~X)$coef + b1 _ lm(y1~X)$coef + num _ sqrt(sum((b1-b0)^2)/sum(b0^2)) + yh0 _ lm(y~X)$fitted + yh1 _ lm(y1~X)$fitted + den _ sqrt(sum((yh1-yh0)^2)/sum(yh0^2)) + kap _ c(kap, num/den) + } > stem(kap) N = 100 Median = 4.578734 Quartiles = 2.18714, 5.65403 Decimal point is at the colon 0 : 7777888999 1 : 14 1 : 567888899 2 : 1112223444 2 : 8 3 : 0 3 : 556889 4 : 01122244 4 : 5556777888999 5 : 00112223334 5 : 5566778999 6 : 11111124444 6 : 55555666 (c) DEPENDENT X's (X _ cbind(1:6,2:7)) > X _ cbind(1:6,2:7) > y _ c(4.5, 5.5, 6.5, 8.0, 10.0, 12.0) > y1 _ y + rnorm(length(y)) > lm(y ~ X)$coef Problem in lm.fit.qr(unset(x), unset(y)): computed fit is singular, rank 2 Use traceback() to see the call stack > lm(y1 ~ X)$coef Problem in lm.fit.qr(unset(x), unset(y)): computed fit is singular, rank 2 Use traceback() to see the call stack Since X is (really!) non-full-rank, lm can't go on. To get some sort of fit anyway, we could try the "svd" method of lm: > lm(y ~ X,method="svd")$coef (Intercept) X1 X2 1.166667 0.1666667 1.333333 > lm(y1 ~ X,method="svd")$coef (Intercept) X1 X2 1.69464 -0.1774483 1.517192 It would be tempting to compare these changes in b-hat with the condition number (infinity!) again, but this comparison doesn't make sense, since it is the condition number for a different problem (essentially, the full-rank regression problem)! Note that condition=infinity correctly characterizes the idea that b-hat should be in some linear subspace (and hence values of b0 and b1 can wander infinitely far from each other, regardless of the (unique) values of yh0 and yh1!). (d) LONGLEY: > lm(y~X)$coef (Intercept) XGNP deflator XGNP XUnemployed XArmed Forces XPopulation -3482.259 0.01506187 -0.03581918 -0.0202023 -0.01033227 -0.05110411 XYear 1.829151 > y1 _ y + rnorm(length(y)); lm(y1 ~ X)$coef (Intercept) XGNP deflator XGNP XUnemployed XArmed Forces XPopulation -3755.022 -0.2557951 0.0521333 -0.01156115 -0.01268914 -1.10314 XYear 2.02731 > b0 _ lm(y~X)$coef > b1 _ lm(y1~X)$coef > sqrt(sum((b1-b0)^2)/sum(b0^2)) [1] 0.07833003 > yh0 _ lm(y~X)$fitted > yh1 _ lm(y1~X)$fitted > sqrt(sum((yh1-yh0)^2)/sum(yh0^2)) [1] 0.007769028 So the condition number for the Longley data provides a very crude bound on the sensitivity of b-hat to this (small) change in y-hat. Experimentation suggests that here too the condition number is pessimistic (though there is a suggestion of a long right tail that might warrant further investigation!): > kap _ NULL > for (i in 1:100) { + y1 _ y + rnorm(length(y)) + b0 _ lm(y~X)$coef + b1 _ lm(y1~X)$coef + num _ sqrt(sum((b1-b0)^2)/sum(b0^2)) + yh0 _ lm(y~X)$fitted + yh1 _ lm(y1~X)$fitted + den _ sqrt(sum((yh1-yh0)^2)/sum(yh0^2)) + kap _ c(kap, num/den) + } > stem(kap) N = 100 Median = 70.6323 Quartiles = 41.47714, 101.1353 Decimal point is 1 place to the right of the colon 0 : 2356667 1 : 01123 2 : 245 3 : 45589 4 : 0000122233567 5 : 003458 6 : 0002468899 7 : 01123445689 8 : 1124589 9 : 01226679 10 : 3499 11 : 02336 12 : 0379 13 : 4 14 : 3 15 : 03458 16 : 29 17 : 18 : 03 19 : 1 However the large condition number (23,845,862) does suggest that numerical problems are likely, as we saw when we tried to compute (X'X)^{-1} directly. -----------------------------------------------------------------------