I will stress the fact that linear regression, like all statistical techniques, involve assumptions. A typical statistical analysis begins with simple graphical and descriptive analyses followed by the application of formal statistical techniques. The analysis must then be followed up by diagnostics that check whether the assumptions have been violated. Most good analyses include a heavy dose of graphical techniques. The golden rule in this course is: Never do a statistical analysis without first plotting the data. Also bear in mind that there is never one right way to do an analysis - typically, many analyses are performed.
You will also get plenty of practice writing reports of the types expected of you in the real world. We will primarily focus on reports for intelligent non-statisticians (e.g. a company officer) and brief documentation of what you did for a statistically competent supervisor.
The assignments will make use of computer packages, especially S-plus. I will teach you the minimum you need to know about S-plus: enough to do the work, but not enough to call yourself an expert. Never hand in raw computer output. Cut out (either electronically, or with scissors) plots, tables, etc. from the output and include them in your report as needed.
Assignments will request one or more of the following components:
The project will be the joint responsibility of a small group of students. You will analyze a data set, possibly in conjunction with a subject area expert. For students taking the sequel course, Advanced Data Analysis II (36-402) the project will be a preliminary analysis of the data that you will analyze throughout that course. I will aid you in choosing these projects, but if you have appropriate data from your own research, we can discuss whether that can form the basis for your second project.
Throughout the homework assignments we will include analysis of gas mileage data, performed in stages. The special importance of this component of the homework is that it will serve as a kind of practice project. In fact one homework assignment will be to put all of the pieces that you have worked on together into a single report on the analysis of the gas milage data. Be sure to save all of your homework assignments so that you can write the summary report.
After you choose your project, weekly homework will consist of only one or two problems and a brief summary of your progress on the project. You must also schedule a project progress meeting with me between 11/27/00 and 12/6/00. Any additional consulations with me before or after this meeting are also welcome.
There is some inherent subjectivity in grading data analysis and write-ups. Data analysis will be graded based on our judgment of reasonable application of the principles and techniques that you will learn in the course. To aid us, you should use justifying sentences, e.g., ``The data were log transformed to reduce the marked positive skew.''
Grading of written material is based on how well you organize and present the requested components in a manner suitable for the intended audience. Reports that are excessively long (e.g. too much detail or extraneous matter) and those that are too short (e.g. no justifications of conclusions) will not receive full credit. In the real world, errors in spelling and grammar in your work have a major negative impact on the way you are perceived. If spelling and/or grammar corrections are included on your homework or project, these are intended to point out areas you should work on, but do not affect your grade.
Dates | Topics | RwG pages |
8/29 | Course introduction, Examples | |
8/31 | Introduction to Splus (Meet in Wean 5202) | |
9/5 | Exploratory data analysis | 1-23 |
9/7-9/14 | Simple linear regression | 29-59 |
9/19 | Multiple regression | 65-85 |
9/21 | Splus Skills (Meet in Wean 5202) | |
9/26-10/3 | Multiple regression | 65-85 |
10/5 | Dummy variable, ANOVA as regression | 85-101 |
10/10 | MIDTERM EXAM | |
10/12-10/24 | Regression criticism | 109-136 |
10/26-10/31 | Fitting curves, non-linear regresion | 145-174 |
11/2 | Journal article | |
11/7-11/9 | Project description presentations | |
11/14 | Journal article | |
11/16-11/21 | Robust regression | 183-212 |
11/28-11/30 | Logi(s)t(ic) regression (*) | 217-242 |
12/5 | Monte-Carlo and Bootstrap (*) | 303-326 |
12/7 | Random effects and Mixed models | suppl. |
12/12 | Second project write-up due, Review for Final Exam |