Statistics 36-490: Undergraduate Research

Spring 2010

Brian Junker and Cosma Shalizi

MW 10:30--11:50 Scaife Hall 212

36-490 is a semester-long course in practical data analysis. Students will work in teams of about three to solve problems facing actual scientific investigators with real data. Students will build on the skills of data exploration, model development, model fitting and assessment, and interpretation that they began in earlier classes, but also practice working with subject-area scientists, collaborative research, and both written and oral scientific communication. At the end of the semester, each team will present a poster at the Meeting of the Minds undergraduate research symposium, as well as presenting a written report in the style of a scientific paper.


In addition to the initial list of possible projects, students are invited to come up with their own, subject to instructor approval. Any project must involve both real data and an outside investigator.

Please read the handout on interacting with your investigator.

Course Mechanics

Each group will make three in-class presentations on their progress to date, and submit drafts of their written report and poster presentation. There will also be homework assignments connected with the lectures. See the syllabus for details of deadlines and grading.

We will meet twice a week. Mondays will usually be a lecture on a relevant methodological topic or aspect of the research process; teams will meet separately with either Prof. Junker or Prof. Shalizi on Wednesdays during class time.

Office hours are by appointment (but we are generally happy to talk with you at any time if we're free).

Grades will be available through the class Blackboard site.

Lecture Schedule

Subject to revision as we go along. Details here.


Unless you are told otherwise, all electronic assignments should be submitted as either plain text files or PDFs. Do not send Word files; if you want to write in word, convert to PDF before turning it in. (This ensure that we can read your file, and it appears exactly the same to us as it does to you.)
  1. Notes on the project presentations. Due in three rounds, on Friday 15 January, Wednesday 20 January and Friday 22 January.
  2. Group project description. Due on Friday 29 January.
  3. Team goals and working agreement. Due on Wednesday 10 February.

Key Dates

Slide Presentation I March 1 and 3
Slide Presentation II March 29 and 31
Meeting of the Minds Registration April 1
Draft Paper April 5 at 10:30 AM
Draft Poster April 26 at 10:30 AM
Final Paper April 28 (last class meeting)
Final Poster May 5, Meeting of the Minds



  1. Interacting with your faculty investigator
  2. Writing about projects (to accompany HW #2)
  3. Principal Components Analysis. Also: R for examples; the cars data set; and the New York Times workspace
  4. Factor analysis. R example from class, the sleep in mammals data set. R code for the Thomson ability-sampling model.
  5. Mixed effects in linear models; R for examples; spinebmd.csv (data file for examples)
  6. Smoothing and non-parametric regression: lecture handouts 16, 18, 19, 20 and 21 from 36-350. (Lectures 17 and 22 may also be useful.)
  7. Cross-validation and additive models: Main slides, R
  8. Clustering: slides, first R example, second R example


You don't have to use R, but it's probably a good idea.

Scientific Writing, Statistical Consulting, Professional Ethics