Advanced Data Analysis I

CMU 36-401 (Fall 2001)

Instructors  Brian Junker 
132E Baker Hall 
Office hours: T,Th 10-11am 
Kim Sellers
228C Baker Hall
Office Hours: M 2-3pm
Class Schedule  Tuesday & Thursday, 3-4:20 p.m. 
237B Baker Hall
Syllabus & Policies  postscript form  
Teaching Assistant  Marnie Bertolet
132B Baker Hall 
office hours: Wed 3:30-4:30, Wean 5202 
Schedule and Homework

    Date  Subject Material  Reading  Homework 
w1 Tu 8/28 Course introduction, Examples     
w1 Th 8/30  Introduction to Splus (Meet in Wean 5202) MASS pp. 1-68: skim headings  HW1 - REVISED (please read)
Part A Solutions
Part B Solutions
w2 Tu 9/4 Exploratory data analysis  RwG pp. 1-23 (Chapter 1)  
w2 Th 9/6  Finishing examples of EDA with Splus    
w3 Tu 9/13  Simple linear regression  RwG pp. 29-59 (Chapter 2) REVISED Reading (due 9/18) and Writing (due 9/20)  Example of a reasonable answer
w3 Th 9/13     HW02 due Thurs Sept 20 (NO REVISION)  SOLUTIONS
w4 Tu 9/18  More simple regression, maybe some Mult. Regr. RWG Ch 2, and pp 66-69 (start of Ch  3)  
w4 Th 9/20 Matrix algebra and Multiple Regression  RWG, pp 65-77
RWG pp333-342 
 HW03 due Thurs Sept 27 Solutions
(Due date changed to Tues Oct 2)
w5 Tu 9/25  Example of multiple regression: the wage data    
w5 Th 9/27  Example of mulitple regression: the fuel consumption data    
w6 Tu 10/2  Variable Selection I RWG pp 77-84   HW04 due Tue Oct 9  Solutions
w6 Th 10/4  Variable Selection II    Reading/writing discussion exercise, due Thu Oct 11.
Here is a partial outline of Alley Ch2 (optional)
w7 Tu10/9 Interactions and Dummy Variables (tentative) RWG pp 84-101  (MIDTERM EXAM MOVED TO TUE OCT 16) 
w7 Th 10/11  Review for midterm exam    
w8 Tu 10/16   MIDTERM EXAM    
w8 Th 10/18  Regression Criticism I: Assumptions  RWG Ch 4  HW05 (Due Oct 25)  Solutions
w9 Tu 10/23  Regression Criticism II: Outliers and Influence  RWG Ch 4  
w9 Th 10/25  401/402 Project Descriptions  Principal Investigator Visits  See description sheets.
w10 Tu 10/30  401/402 Project Descriptions  Principal Investigator Visits  See project selection sheets.
w10 Th 11/1  Nonlinear regression I   RWG Ch 5  
w11 Tu 11/6  Nonlinear regression II  RWG Ch 5  
w11 Th 11/8  Robust Regression I  RWG Ch 6  HW06 (Due Thu Nov 15)
w12 Tu 11/13  Robust Regression II  RWG Ch 6  
w12 Th 11/15  Logistic Regression I  RWG Ch 7 HW07 (Due Thu Nov 27) solutions 
w13 Tu 11/20  Logistic Regression II (?)  RWG Ch 7  
w13 Th 11/22  THANKSGIVING    
w14(*) 11/27  Principal Components I  RWG Ch 8  HW08 (Due Thu Dec 6) solutions
w14 (*) 11/29  Principal Components II  RWG Ch 8  
w15(*) 12/4  Factor Analysis (from Principal    Components II)  RWG Ch 8  
w15(*) 12/6  Work on Projects in Wean computer cluster    
w16 12/11  Project papers due Fri Dec 14    
Finals Week FINAL EXAM THU DEC 13, 830--1130    
(*) Project progress meetings with instructors in approximately weeks 14 and 16.
Data Downloads

Link  Description  More info 
concord1.dat Concord household water use  concord1.txt
widgets.dat Widget data for HW 1, problem 2   
seapart.dat Seabird data (partial); for HW 1, problem 3  See RwG p. 61 
cane.dat Sugar cane data for HW1, problem  cane.txt
Prestige.dat Job Prestige data from Canadian Census Prestige.cbk (more info, and variable names)
cyclist.html Cylist data  
homedat.html Home resale data   
fuel-cr.dat Consumer reports data  Columns: Weight Disp Mileage Fuel Type 
fuel-alr.dat Fuel consumption data for the 50 states
grass.dat Chemical factors in soil affecting spartina graass growth grass.txt
lead.dat Lead toxicity data   
televisions.dat Life expectancy data  televisions.txt
wages.dat CPS wage data  wages.txt
cheese.dat Cheese data  cheese.txt
ozone.dat Ozone data ozone.txt
salamander.dat Salamander data See RwG p. 104
cassava.dat Cassava data See RwG p. 137
deforest.dat Deforestation data See RwG p. 175
solrad.dat Solar radiation data  See RwG p. 177
eggs.dat Egg data See RwG p. 180
kob.dat Kob data See RwG p. 213
titanic.dat Titanic data Columns: Class, Adult, Male, Survive
streams.dat streams data for hw 08  
shetland.dat shetland data for hw 08  
Handouts and In-Class Exercises

Link  Description  More info 
Avoiding statistical pitfalls 8/28 Handout to be discussed Sep 6 in class. Postscript file
Splus on campus 8/28 Where to run Splus on campus Postscript file
Depression and Sleep 8/28 Handout(s) for breakout exercise Postscript file 8/30Unix Summary  8/30 Notes: Introduction to S-plus  
 RWG Ch 1 handouts 9/4 Handouts to accompany in-class S demo   
  9/11 CLASS CANCELLED 9/11 Announcement with Reading and Writing Assignments  
 handouts 9/13 lecture notes and handouts  
 Simple regression 9/18 lecture notes and breakout exercise  Postscript files
Simple regression in matrix form
(Handed out in class 9/20; discussed 9/27)
9/20 Handouts to accompany lecture Postscript file.
Please read the handout for next Tuesday's (9/27) lecture
Multiple regression example: Fuel consumption data 9/29 Handout to accompany lecture Postscript file.
Variable Selection I 10/2 Handout to accompany lecture Postscript file.
Variable Selection II 10/4 Handout to accompany lecture Postsctipt file.
Interactions and Dummy Variables  10/9  Postscript file.
Case study: Personal Ozone Levels in Children  10/11  Postscrtipt file.
Model Cricicism I: Assumptions 10/18 Lecture notes and exercise Postscript file.
Model Criticism II: Influence Analysis 10/23 Lecture notes and exercise postscript file.
401/402 PROJECT DESCRIPTIONS 10/25 Project presentations by investigators postscript file
401/402 PROJECT SELECTION SHEETS 10/30 Project presentations by investigators postscript file
Nonlinear Regression I (Ch 5 RWG) 11/1 Lecture notes postscript file
Nonlinear Regression II (Ch 5 RWG) 11/6 lecture notes and exercise postscript file
Robust Regression I (Ch 6 RWG) 11/8 lecture notes and exercise postscript file; see also breakout session.
Robust Regresssion II (Ch 6 RWG) 11/13 lecture notes and exercise
Logistic Regression I (Ch 7 RWG) 11/15 lecture notes and exercise
Logistic Regression II (Ch 7 RWG) (?) 11/20 lecture notes and exercise
Principal components I (Ch 8, RWG) 11/27 lecture notes
Principal components II (Ch 8, RWG) 11/29 and 12/4 lecture notes; includes Factor Analysis corrected pp 7,8,9 for lecture notes Intro to using Latex for reports  template.tex 

Splus Downloads

Link  Description  More info 
S-Tutorial Written by a CMU Student  
cheatsheet Splus usage summary   
contents.html Splus on-line tutorial   
SplusTips.html Splus Tips and Links   
Splus in MSWord How to convert a ps figure from Splus into tif format, for inclusion in MS Word docs  
symplot.q function: symmetry plot   
ps.q function: copy a plot to a .ps file  and optionally print it 
Sintro.q commands: Sintro.q  Solutions to Splus Intro exercises 
ndhist.q function: histogram plus normal density   
rwgbox.q function: boxplots in RwG style   
qplot.q function: quantile plot   
qqn.q function: quantile-normal plot   
qqenv.q function: new quantile-normal plot that includes 95% conf envelope
scatbox.q function: scattergram + marginal boxplots   
stdres.q function: calculate standardized residuals (better check of normality of residuals)
collin.q function: check collinearity for a set of X's  R^2 for predicted each on all others 
sum.step.q function: exhaustive model selection by AIC, BIC or adj. R^2   
mypairs.q function: pairs plot with boxplots for few categories   
DW.q function: Durbin-Watson test   
spruce.q sample code: for hw6 (dummy/interaction plotting)  
influence.q function: influence measures   
partreg.q function: partial regression influence plots  
CookPlot.q function: residual vs. fit plot with Cook Distance   
medplot.q function: Exploratory band regression   
logitinf.q function:  influence measures and residuals for logistic regression

