STATISTICS 545 - DATA ANALYSIS
(97/98 Term 1)
Instructor: Paul Gustafson
The introductory handout is here (latex file,
readable as text).
End-of-term comments
from last year's course are also available.
The text for the course is
Modern Applied Statistics with S-Plus, Second Edition,
by Bill Venables
and Brian Ripley, 1997.
The book has a
web page,
containing useful information and links.
Check out the extra
material in the "complements".
In order to hone our data analysis skills, we will require lots of data sets
to analyze! Here are some sources of data sets
-
The software which accompanies the textbook includes all the data sets
mentioned in the book, as well as some others. See Appendix A of the
text for a list.
-
StatLib
,
is a general purpose server for the statistical community. There are several
sites on Statlib where data sets are available, including
-
The datasets
archive contains many varied data sets, as well as collections of data
sets from some books.
-
The Data and Story Library
is an "online library of datafiles and stories that illustrate the
use of basic statistics methods."
-
There are several other sites on StatLib which contain data sets.
Check out the index on the
StatLib
home page.
-
A
Data Sources
page is maintained at the University of Nevada.
-
UBC Statistics members can access some data sets available on the
department network. Check the department gopher under computing.
If you are looking for S software,
try the
S-archive.
As they become available, course materials are posted below.
-
The galaxy velocity examples were used to illustrate histograms and
kernel density estimation
(
S-code
)
-
The baseball salary examples were used to illustrate bootstrapping
(
S-code ,
data
).
I found these data at the
Chance Database.
-
There were a couple of graphical output examples, one
based on the Stormer Viscometer data
(S-code), and the other based on the
Boston housing data
(S-code).
-
The Mercury in Bass data (found at
DASL) were analyzed
with with weighted linear regression
(S-code).
We also had some examples of the S-Plus model formulae syntax
(S-code).
-
During our discussion of generalized linear models, we used
simulations to learn about residuals
(S-code)
and overdispersion
(S-code).
- We analyzed the CPU data (available in the
MASS library) using additive models and projection
pursuit regression
(S-code),
and then later we tried fitting neural networks to these data
(S-code).
-
We used a small simulation study to see how cross-validation
prevents overfit
(S-code).
- We examined how well regression trees worked with some
simulated data
(S-code).