STATISTICS 545 - DATA ANALYSIS
Instructor: Paul Gustafson
The course outline is here in latex format.
The text for the course is
Modern Applied Statistics with S-Plus, by Bill Venables
and Brian Ripley, 1994.
Ripley's homepage has
links to information about the book and associated software.
In order to hone our data analysis skills, we will require lots of data sets
to analyze! Here are some sources of data sets
-
The software which accompanies the textbook includes all the data sets
mentioned in the book, as well as some others. See Appendix A of the
text for a list.
-
StatLib
,
is a general purpose server for the statistical community. There are several
sites on Statlib where data sets are available, including
-
The datasets
archive contains many varied data sets, as well as collections of data
sets from some books.
-
The Data and Story Library
is an "online library of datafiles and stories that illustrate the
use of basic statistics methods."
-
There are several other sites on StatLib which contain data sets.
Check out the index on the
StatLib
home page.
-
A
Data Sources
page is maintained at the University of Nevada.
-
UBC Statistics members can access some data sets available on the
department network. Check the department gopher under computing.
If you are craving more information about S, try the
FAQ
.
As well, back at
Statlib
one can find the
S-archive, which contains much S software, as well as
the
archives of the S-news mailing list.
Some students asked for electronic copies of the S-Plus examples
from class. Here they are. Some are better documented with
comments than others---CAVEAT EMPTOR!
OUTSIDE BROWSERS: these probably won't make much sense without
the class notes and handouts.
-
Session 1:
Graphics examples using the Boston housing data set (available from statlib
data archive).
-
Session 2:
Kernel density estimation examples using simulated data sets.
-
Session 3:
Bootstrapping examples using simulated data sets.
-
Session 4:
Model formulae examples.
-
Session 5:
Regression analysis of the "Mercury in Fish" data set.
-
Session 6:
Poisson overdispersion examples.
-
Session 7:
Robust regression examples.
-
Session 8:
Regression tree examples.
-
Session 9:
Survival analysis examples (Kaplan-Meier estimator and proportional hazards
model).
-
Session 10:
Classification example, using the forensic glass data set.
Here are my
end-of-course comments