MW 10:30--11:50, Wean Hall 8427

36-490 is a semester-long course in applied statistics. Students will work in teams of about three to solve problems facing actual scientific investigators with real data. The goal is to learn how to translate scientific questions into statistical problems, develop and assess solutions to those problems, and translate the statistical solutions back into scientific answers. Students will build on the skills of data exploration, model development, model fitting and checking, and interpretation that they began in earlier classes, but also practice working with subject-area scientists, collaborative research, and both written and oral scientific communication.

At the end of the semester, each team will present a poster at the Meeting of the Minds undergraduate research symposium, as well as presenting a written report in the style of a scientific paper.

Students must passed 36-401, modern regression, and either passed or be enrolled in 36-402, advanced methods of data analysis. Admission to the class is by special application and consent of the instructor only.

Please read the handout on interacting with your investigator.

We will meet twice a week. Mondays will *usually* be a lecture on a
relevant methodological topic or aspect of the research process; teams will
meet separately with Prof. Shalizi on Wednesdays during class time.

Office hours are by appointment; please see Prof. Shalizi's public calendar.

Grades will be available through the class Blackboard site.

- Statistical consulting and statistical collaboration (15 January)
- Modeling count data (27 January)
- Decision trees, bagging, random forests (3 February)
- Missing data (10 February)
- Causal inference: identification and estimation (17 February)
- Causal inference: partial identification and discovery (24 February)
- Resampling dependent data (17 March)
- Model checking: residuals, calibration, simulation tests (24 March)
- Poisson process and other models for events over time (7 April)
- Mixed-membership models (14 April)
- Writing papers (21 April)
- Giving talks (28 April)

Notes and associated assignments will be posted here after the lectures.

During the semester, each group will make brief presentations to the whole class on the progress of their projects. Each group member must participate in each of these presentations. The complete project work will be presented in an end-of-the-year poster session.

Each group must turn in a formal, written report on the last day of class. A draft of the written report is due in early April. There will be no exams for this class, but several of the lectures will have associated, written homework assignments.

Two or three times during the semester, each student will be asked to assess the contribution of each group member to the team effort, and this will be factored into your project grade.

Slide Presentation I | March 3 and 5 |

Slide Presentation II | March 31 and April 2 |

Meeting of the Minds Registration | April 2 |

Draft Paper | April 14 at 10:30 AM |

Draft Poster | April 28 at 10:30 AM |

Final Paper | May 7 |

Final Poster | May 7, Meeting of the Minds |

Homework | 15% |

Participation during class discussion | 10% |

Participation during group project meetings | 10% |

Oral presentations | 15% |

Written report | 30% |

Poster presentation | 20% |

- Michael Alley, The Craft of Scientific Writing (3rd edition, Berlin: Springer, 1996, ISBN 0-387-94766-3)
- D. R. Cox and Christl Donnelly, Principles of Applied Statistics (Cambridge: Cambridge University Press, 2011, ISBN 978-1-107-64445-8)
- George Polya, How to Solve It: A New Aspect of Mathematical Method (2nd edition, Princeton: Princeton University Press, 1957, ISBN 0-691-02356-5)

These books are optional but recommended:

- Wayne C. Booth, Gregoy G. Colomb and Joseph M. Williams, The Craft of Research (3rd edition, Chicago: University of Chicago Press, 2008, ISBN 0-226-06566-9)
- W. N. Venables and Brian D. Ripley, Modern Applied Statistics with S (4th edition, Berlin: Springer, 2002, 978-1-441-93008-8)
- Joseph Williams, Style: Toward Clarity and Grace (Chicago: University of Chicago Press, 1990, ISBN 0-226-89915-2)

Some useful online resources:

- The official intro, "An Introduction to R", available online in HTML and PDF
- John Verzani, "simpleR", in PDF
- Google R Style Guide offers some rules for naming, spacing, etc., which are generally good ideas
- Quick-R. This is primarily aimed at those who already know a commercial statistics package like SAS, SPSS or Stata, but it's very clear and well-organized, and others may find it useful as well.
- Patrick Burns, The R Inferno. "If you are using R and you think you're in hell, this is a map for you."
- Thomas Lumley, "R Fundamentals and Programming Techniques" (large PDF)
- The website Software Carpentry is not specifically R related, but contains a lot of valuable advice and information on scientific programming.
- RStudio is an "integrated development environment" (IDE) for R. It's designed to make the common tasks of writing and running R code more efficient, easier, and more reproducible.
- Minimal Advice on Programming, Especially in R, and the lecture notes for 36-350, statistical computing may also be helpful.

There are also some handy books:

- Venables and Ripley's Modern Applied Statistics with S
is one of our recommended texts; it covers the implementation of a
*lot*of standard statistical methods. (R is a dialect or descendant of the S language.) It does tend to presume both some knowledge of the language and some knowledge of the methods, however. (It answers "How do I do*X*in S?", not "What is*X*, anyway?") - Paul Teetor, The R Cookbook (Sebastopol, California: O'Reilly, 2011) and Winston Chang, The R Graphics Cookbook (O'Reilly, 2012) are good references on the day-to-day basics of getting stuff done in R; they're organized by task rather than command.
- John M. Chambers, Software for Data Analysis: Programming with R (New York: Springer, 2008, ISBN 978-0-387-75935-7) is the best book on writing programs in R.

Alley's Craft of Scientific Writing is one of our required texts; it's got a lot of sound advice and information on what you need to do to write a readable scientific paper. Booth et al.'s Craft of Research (recommended) is not so specifically focused on scientific work, but is very sound on the process of figuring out what it is you actually want to research, refining it into a series of manageable problems, and assembling compelling arguments. Williams's Style is (recommended) is the best book of writing advice I've ever found.

- Further resources on scientific writing:
- G. D. Gopen and J. A. Swan, "The Science of Scientific Writing", American Scientist
**78**(1990): 550--558 - Peter B. Medawar, "Is the Scientific Paper a Fraud?", The
Listener
**70**(12 September 1963): 377--378. Reprinted in many collections (e.g., Pluto's Republic [Oxford: Oxford University Press, 1982]), and online in various versions (e.g.)

- On statistical consulting:
- C. Chatfield, "Avoiding statistical pitfalls", Statistical Science
**6**(1991): 249--252 [JSTOR] - D. J. Finney, "Ethical aspects of statistical practice", Biometrics
**47**(1991): 331--339 [JSTOR] - W. G. Hunter, "The practice of statistics: The real world is an idea
whose time has come", American Statistician
**35**(1981): 72--76 [JSTOR] - R. E. Kirk, "Statistical consulting in a university: Dealing with people and other challenges", American Statistician
**45**(1991): 28--34 [JSTOR] - R. Tweedie, "Consulting: Real problems, real interactions, real outcomes", Statistical Science
**13**(1998): 1--29 [JSTOR] - D. A. Zahn and D. J. Isenberg, "Nonstatistical aspects of statistical consulting", American Statistician
**37**(1983): 297--302 [JSTOR]

- Robert P. Abelson, Statistics as Principled Argument (Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1995). A wise and witty
look guide to using statistics in making an
*honest*case for or against some proposition; it would have been a required text, if it wasn't out of print. - A. C. Davidson, Statistical Models (Cambridge, England: Cambridge university Press, 2003). Massive reference on the most common statistical models, what they really assume, and how they really work. Includes just enough theory to be helpful, and good practical examples.
- Julian J. Faraway, Linear Models with R (Boca Raton, Florida: Chapman and Hall/CRC, 2005)
- Julian J Faraway, Extending the Linear Model with R: Generalized Linear, Mixed Effects, and Nonparametric Regression Models (Boca Raton, Florida: Chapman and Hall/CRC, 2006)
- Peter Guttorp, Stochastic Modeling of Scientific Data (London: Chapman and Hall, 1995). Statistical inference for stochastic processes, and building stochastic process models from scientific theories.
- Trevor Hastie, Robert Tibshirani and Jerome Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition, Berlin: Springer, 2009). A deservedly-standard textbook on modern, computer-intensive statistical methods.
- Jeffrey S. Simonoff, Smoothing Methods in Statistics (Berlin: Springer-Verlag, 1996). A gentle introduction to nonparametric smoothing and its uses.