Finger Exercises to be discussed Wed Sept 25 2002
-------------------------------------------------

Consider the following data sets:

A. The Longley data (Monahan, p. 86), which is highly collinear.

 * Splus: 
      X <- longley.x
      y <- longley.y
 * R:
      data(longley)
      X <- longley[,1:6]
      y <- longley[,7]


B. The Steel and Torre data (Monahan, p. 109 ex 5.2)
      X <- 1:6
      y <- c(4.5, 5.5, 6.5, 8.0, 10.0, 12.0)


-----------------------------------------------------------------------

1. Compute the least-squares regression coefficients for
   the Longley and Steel/Torre datasets, using each of the following
   methods.  Nb., don't forget to say X <- cbind(1,X)) to get the
   intercept in the model, before using methods (a), (b), (c).
   
    (a) Inverting XTX directly:
           XTX _ t(X) %*% X
           XTXinv _ solve(XTX)
    (b) Cholesky decomposition 
           [use chol() to find the cholesky decomposition]
    (c) QR decomposition 
           [use qr(), qr.Q() and qr.R() to find the QR decomposition]
    (d) lm in SPLUS or R 
           [don't say X _ cbind(1,X) in this case]

   and compare the results.  [There will not be too much difference
   because Splus does all its calculations in double precision, and
   typically shows only 7 or so significant decimal digits.  How could
   you find differences in the lower-order digits?  Single precision
   calculations might not be able to handle the longley data, for
   example]

   [if you are using Splus, explore help(lm.fit) as well.  This is the
   fitting function that lm uses.  lm.fit in R is less versatile.]

2. As described on pp. 84--85 of Monahan, the condition number for a
   linear regression bounds relative changes in the estimated
   regression coefficients in terms of relative changes in the
   response variables Y, or in the predictors X, in a regression
   problem.

   Computing the exact condition number is not hard for small
   problems:

   If X is the matrix of regressors (without the intercept), the
   condition number may be computed as:

       XTX _ cbind(1,X)%*%cbind(1,X)
       v _ eigen(XTX)$values
       k _ sqrt(max(v)/min(v))

   Two alternatives:

       XTX _ cbind(1,X)%*%cbind(1,X)
       v _ svd(XTX)$d
       k _ sqrt(max(v)/min(v))

       v _ svd(cbind(1,X))
       k _ max(v)/min(v)

   Argue that these are all the same (cf. Monahan p. 123).

   (There is also a function kappa() that computes a fast approximate
   condition number, but the exact calculation is usually fast enough.)

3. Compute exact (as in #2) and approximate (using kappa()) condition
   numbers for the following X's [don't forget to cbind with 1!]:

   (a) X _ rnorm(6)         # cbind(1,X) full rank
   (b) X _ 1:6              # cbind(1,X) full rank, very mild collinearity
   (c) X _ cbind(1:6,2:7)   # cbind(1,X) not full rank
   (d) X _ longley.x        # cbind(1,X) full rank, highly collinear

4. For 3(a), (b) and (c), use the y vector for the Steel/Torre data;
   for 3(d) use the y vector from the Longley data.  For all four
   examples, compare the regression coefficients in 

   lm(y ~ X)       vs     y1 _ y + rnorm(length(y)); lm(y1 ~ X)  

   and relate to the results in #3.