Statistics 36-490: Undergraduate Research
Spring 2010
MW 10:30--11:50 Scaife Hall 212
36-490 is a semester-long course in practical data analysis. Students will
work in teams of about three to solve problems facing actual scientific
investigators with real data. Students will build on the skills of data
exploration, model development, model fitting and assessment, and
interpretation that they began in earlier classes, but also practice working
with subject-area scientists, collaborative research, and both written and oral
scientific communication. At the end of the semester, each team will present a
poster at the Meeting of the Minds
undergraduate research symposium, as well as presenting a written report in the
style of a scientific paper.
Projects
In addition to the initial list of possible
projects, students are invited to come up with their own, subject to
instructor approval. Any project must involve both real data and an outside
investigator.
Please read the handout
on interacting with your
investigator.
Course Mechanics
Each group will make three in-class presentations on their progress to date,
and submit drafts of their written report and poster presentation. There will
also be homework assignments connected with the lectures. See
the syllabus for details of deadlines and grading.
We will meet twice a week. Mondays will usually be a lecture on a
relevant methodological topic or aspect of the research process; teams will
meet separately with either Prof. Junker or Prof. Shalizi on Wednesdays during
class time.
Office hours are by appointment (but we are generally happy to talk with you
at any time if we're free).
Grades will be available through the
class Blackboard site.
Lecture Schedule
Subject to revision as we go along.
- Statistical consulting
- Principal components and factor analysis
- Mixed-effects linear models
- Missing data
- Clustering
- Grade-of-membership and latent Dirichlet allocation models
- Log-linear models and logistic regression
- Poisson process and other models for events over time
- Hidden Markov models
- Writing and speak
Details here.
Assignments
Unless you are told otherwise, all electronic assignments should be submitted
as either plain text files or PDFs. Do not send Word files; if you
want to write in word, convert to PDF before turning it in. (This ensure that
we can read your file, and it appears exactly the same to us as it does to
you.)
- Notes on the project presentations. Due in
three rounds, on Friday 15 January, Wednesday 20 January and Friday 22 January.
- Group project description. Due on Friday 29 January.
- Team goals and working agreement. Due on Wednesday 10 February.
Key Dates
Slide Presentation I | March 1 and 3 |
Slide Presentation II | March 29 and 31 |
Meeting of the Minds Registration | April 1 |
Draft Paper | April 5 at 10:30 AM |
Draft Poster | April 26 at 10:30 AM |
Final Paper | April 28 (last class meeting) |
Final Poster | May 5, Meeting of the Minds |
Resources
Handouts
- Interacting with your
faculty investigator
- Writing about projects (to accompany HW #2)
- Principal Components Analysis. Also: R for examples; the cars data set; and the New York Times
workspace
- Factor analysis. R example from class, the sleep in mammals data set. R code for the Thomson ability-sampling model.
- Mixed effects in linear models; R for examples; spinebmd.csv (data file for examples)
- Smoothing and non-parametric regression: lecture handouts 16, 18, 19, 20
and 21 from
36-350. (Lectures 17 and
22 may also be useful.)
- Cross-validation and additive models: Main slides,
R
- Clustering: slides, first R example, second R example
R
You don't have to use
R, but it's probably a good idea.
- Venables and Ripley's Modern Applied Statistics with S
(official site) is one of
our recommended texts; it covers the implementation of a lot of
standard statistical methods. (R is a dialect or descendant of the S
language.)
- The official intro, "An Introduction to R", available online in
HTML
and PDF
- John Verzani, "simpleR",
in PDF
- Quick-R. This is
primarily aimed at those who already know a commercial statistics package like
SAS, SPSS or Stata, but it's very clear and well-organized, and others may find
it useful as well.
- Patrick
Burns, The R
Inferno. "If you are using R and you think you're in hell, this is a map
for you."
- Thomas Lumley, "R Fundamentals and Programming Techniques"
(large
PDF)
- John
M. Chambers, Software
for Data Analysis: Programming with R
(official site) is the
best book on writing programs in R.
- Minimal Advice on
Programming, Especially in R.
Scientific Writing, Statistical Consulting, Professional Ethics
- Robert A. Day and Barbara Gastel, How to Write and Publish a
Scientific Paper is our other recommended text.
[Official site]
- M. Alley, The Craft of Scientific Writing
(Official website)
- C. Chatfield, "Avoiding statistical pitfalls", Statistical
Science 6 (1991): 249--252 [JSTOR]
- D. J. Finney, "Ethical aspects of statistical practice", Biometrics 47 (1991): 331--339 [JSTOR]
- G. D. Gopen and J. A. Swan, "The Science of Scientific Writing", American Scientist 78 (1990): 550--558 [online]
- W. G. Hunter, "The practice of statistics: The real world is an idea
whose time has come", American Statistician 35 (1981): 72--76 [JSTOR]
- R. E. Kirk, "Statistical consulting in a university: Dealing with people and
other challenges", American Statistician 45 (1991): 28--34 [JSTOR]
- R. Tweedie, "Consulting: Real problems, real interactions, real
outcomes", Statistical Science 13 (1998): 1--29,
[JSTOR]
- D. A. Zahn and D. J. Isenberg, "Nonstatistical aspects of statistical
consulting", American Statistician 37 (1983): 297--302,
[JSTOR]
- J. M. Williams, Style:
Toward Clarity and Grace
[Official
page]