Statistics 36-490: Undergraduate Research

Spring 2014

Cosma Shalizi

MW 10:30--11:50, Wean Hall 8427

36-490 is a semester-long course in applied statistics. Students will work in teams of about three to solve problems facing actual scientific investigators with real data. The goal is to learn how to translate scientific questions into statistical problems, develop and assess solutions to those problems, and translate the statistical solutions back into scientific answers. Students will build on the skills of data exploration, model development, model fitting and checking, and interpretation that they began in earlier classes, but also practice working with subject-area scientists, collaborative research, and both written and oral scientific communication.

At the end of the semester, each team will present a poster at the Meeting of the Minds undergraduate research symposium, as well as presenting a written report in the style of a scientific paper.

Pre-requisites

Students must passed 36-401, modern regression, and either passed or be enrolled in 36-402, advanced methods of data analysis. Admission to the class is by special application and consent of the instructor only.

Projects

In addition to the projects provided by the instructor, students are invited to come up with their own, subject to instructor approval. Any project must involve both real data and an outside investigator.

Please read the handout on interacting with your investigator.

Course Mechanics

Each group will make multiple in-class presentations on their progress to date, and submit drafts of their written report and poster presentation. There will also be homework assignments connected with the lectures. See below for details of deadlines and grading.

We will meet twice a week. Mondays will usually be a lecture on a relevant methodological topic or aspect of the research process; teams will meet separately with Prof. Shalizi on Wednesdays during class time.

Office hours are by appointment; please see Prof. Shalizi's public calendar.

Grades will be available through the class Blackboard site.

Lectures

Approximately once a week (most Mondays) there will be a lecture on a topic which will be useful for your projects. Usually these topics will be statistical ones (previous topics have included categorical data analysis, missing data and non-response bias, clustering, factor analysis, Markov models, etc.). Most lectures will come with short homework assignments: install a package in R, try a small data analysis on a particular data set, read a paper and discuss in the next class, etc. You are encouraged to discuss the homework assignments with each other, but the work you hand in must be your own. You must not copy mathematical derivations, computer output and input, or written descriptions from anyone or anywhere else, without reporting the source within your work. Please review the CMU Policy on Academic Integrity.

Lecture Schedule

Subject to revision as we go along.

Notes and associated assignments will be posted here after the lectures.

Project Meetings, Presentations, and Reports

The projects will consume the majority of your time in this class. Instead of lectures, most Wednesdays you will have a group meeting with the professor. You should also plan on meeting at least once a week within your project group, and at least once a month with your faculty investigator.

During the semester, each group will make brief presentations to the whole class on the progress of their projects. Each group member must participate in each of these presentations. The complete project work will be presented in an end-of-the-year poster session.

Each group must turn in a formal, written report on the last day of class. A draft of the written report is due in early April. There will be no exams for this class, but several of the lectures will have associated, written homework assignments.

Two or three times during the semester, each student will be asked to assess the contribution of each group member to the team effort, and this will be factored into your project grade.

Assignments

Unless you are told otherwise, all electronic assignments should be submitted as either plain text files or PDFs. Do not send Word files; if you want to write in word, convert to PDF before turning it in. (This ensure that we can read your file, and it appears exactly the same to us as it does to you.)

Key Dates

Slide Presentation I March 3 and 5
Slide Presentation II March 31 and April 2
Meeting of the Minds Registration April 2
Draft Paper April 14 at 10:30 AM
Draft Poster April 28 at 10:30 AM
Final Paper May 7
Final Poster May 7, Meeting of the Minds

Grading

Homework 15%
Participation during class discussion 10%
Participation during group project meetings 10%
Oral presentations 15%
Written report 30%
Poster presentation 20%

Resources

Texts

These books are required:

These books are optional but recommended:

Handouts

  1. Interacting with your faculty investigator

R

You don't have to use R, but you should think hard before doing otherwise. By this point, students in the class are expected to be fairly familiar with at least the basics of the language and of R programming.

Some useful online resources:

There are also some handy books:

Scientific Writing, Statistical Consulting, Professional Ethics

Alley's Craft of Scientific Writing is one of our required texts; it's got a lot of sound advice and information on what you need to do to write a readable scientific paper. Booth et al.'s Craft of Research (recommended) is not so specifically focused on scientific work, but is very sound on the process of figuring out what it is you actually want to research, refining it into a series of manageable problems, and assembling compelling arguments. Williams's Style is (recommended) is the best book of writing advice I've ever found.

Useful References on Statistical Models, Statistical Methods, and Statistical Modeling