General information

Course overview

Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify, and write code, so that they can assemble the computational tools needed to solve their data analysis problems, rather than distorting problems to fit tools provided by others. This class is an introduction to statistically-oriented programming, targeted at statistics majors, without assuming extensive programming background.

Students will learn the core of ideas of programming—data structures, functions, iteration, input and output, debugging, logical design, and abstraction—through writing code to assist in statistical analyses. Students will learn how to write maintainable code, as well as debug and test code for correctness. They will learn how to set up and run stochastic simulations, how to fit basic statistical models and assess the results, and how to work with and filter large data sets. Since code is an important form of communication among scientists, students will also learn how to comment and organize code.

The class will be taught in the R programming language.

Course website

The course website is http://www.stat.cmu.edu/~ryantibs/statcomp/. The course schedule, lecture notes, labs, supplementary materials, etc., will be posted there.

Prerequisites

This is an introduction to programming for statistics students. Prior exposure to statistical thinking, to data analysis, and to basic probability concepts is essential. Previous programming experience is not assumed. Formally, the prerequisites are “Computing at Carnegie Mellon”, 36-202 or 36-208, and 36-225.

Course mechanics

Each week, the materials for the week will be covered in lecture during the Monday class period. There will be a short quiz released after class, due 11:59pm on Monday (the same day). The Wednesday and Friday class periods will serve as lab sessions, in which students work through a set of exercises. The lab from each week will be due 11:59pm on Sunday (the end of the week). Lastly, there will be a final take-home exam.

Attendance

Attendance is not required, but highly encouraged. You will learn more and have more fun (arguably? depends on what you plan to be doing otherwise!). Especially if this material is brand new to you, coming to lectures and labs will be the best way for you to learn and get help.

Grading

Grades will be calculated as follows:

  • Labs: 60%
  • Quizzes: 20%
  • Final exam: 20%

Here are the cutoffs for letter grades, based on total percentages:

  • A: 90% or higher
  • B: 80% to 89%
  • C: 70% to 79%
  • D: 60% to 69%
  • R: 59% or lower, on a case by case basis

The Instructors may adjust these cutoffs, but only in the direction that favors the students. For example, the cutoff for an “A” may end up being adjusted to be lower than 90%, but not higher.

R and RStudio

R is a free, open-source programming language for statistical computing. All of our work in this class will be done using R. You will need regular, reliable access to a computer running an up-to-date version of R. If this is a problem, then let the Instructors or TAs know right away.

RStudio is a free, open-source R programming environment. It contains a built-in code editor, many features to make working with R easier, and works the same way across different operating systems. Most importantly it integrates R Markdown seamlessly. You will use RStudio for the labs and final.

Getting help

Labs

Coming to labs are the best way to get help. You will be able to ask questions of the Instructors and TAs for the entire time.

Office hours

Office hours times are spread out over the week. The exact times and locations can be found on the course website.

Piazza

Piazza will be used for questions and discussion on the class contents. Class announcements will also be made through Piazza. The link for the Piazza group is given on the course website.

Piazza can be a very successful medium for helpful, class-wide discussions, but without rules, discussions can also quickly get out of hand. Here are the rules for our Piazza group:

  1. Be considerate to others (respectful language, no sarcasm).
  2. Before posting a question, check that it (or a related question) has a not already been posted. If it has, then use the existing thread for further questions or discussion.
  3. For questions about the labs, “What is wrong with this code?” is not an acceptable question. Code that is part of your solution cannot be posted to Piazza.
  4. Along with your posted question, explain step-by-step what you tried to answer your own question (without posting your solution code).
  5. Private questions on Piazza (an option for questions that only Instructors and TAs can see) are disabled since they will not be able to be answered in a reliable/timely manner.

Rule #2 above is highlighted because it is important and in our experience it is the usually the first rule to be forgotten. Read Piazza first, then post! Duplicated posts can snowball and then Piazza can quickly become ineffective!

Content deemed inappropriate—by the above rules and otherwise—will be taken down by the Instructors or TAs.

Email

Email will be used for questions on class administration (class policies, exceptional circumstances, etc.), rather than class contents. Please direct such inquiries to Associate Instructor ???. The subject line of all emails should begin with “[36-350]”. Professor Tibshirani will be available for issues that cannot be resolved first with the Associate Instructor.

Assignments

Quizzes

Quizzes will be short (10 questions or less), and consist of true/false and multiple choice questions. They will be completed online, due 11:59pm on Monday each week, with the links given on the course website. Quizzes are supposed to be an easy recap of the material covered in Monday’s lecture each week. The quiz system is designed so that if you answer a question incorrectly, you will find out immediately, and you can always retry the question (as many times as needed) and receive half credit for a correct answer.

Labs

Labs will be completed in R Markdown format (file extension Rmd). They will involve writing a combination of code and written prose, and the R Markdown format is crucial since it allows for a combination of the two. Labs will be turned in through Canvas, due 11:59pm on Sunday each week, and they must be submitted only in HTML format, the result of calling “Knit HTML” from RStudio on your R Markdown document. Be careful that you do this, because work submitted in any other format will receive a grade of 0, without exception.

Note also: all code used to produce your results must be shown in your HTML file (e.g., do not use echo=FALSE or include=FALSE as options anywhere).

Labs will have italicized questions that will be graded more carefully and given more weight in the final score (think of them as homework questions, embedded in the labs). Students may choose to collaborate with friends on the labs, but must indicate with whom they collaborated. Also, be sure to carefully the collaboration policy below.

Final exam

In place of an in-class final exam, there will be a take-home exam. It will be essentially like a lab, but with cumulative coverage (all course topics are fair game), and no collaboration with peers is allowed. It will be submitted through Canvas, just as with labs, and the due date will be posted on the course website.

Late work

You will have a total of 5 late days in the semester, to use between the labs. For example, you may apply all 5 late days to Lab 1; or you may apply 2 late days to Lab 1, 2 late days to Lab 4, and 1 late day to Lab 6; etc. After these 5 late days are used up, no late work will be accepted.

Late days do not apply to quizzes. No late quizzes will be accepted.

In case of truly exceptional situations—such as family emergencies or illness—the Instructors can make exceptions and allow late work (labs or quizzes). If you think your situation is truly exceptional but is not an emergency, then you must notify the Associate Instructor of your situation at least 3 full days before the particular assignment is due.

Collaboration, copying, and plagiarism

You are encouraged to discuss course material with your classmates. All work you turn in, however, must be your own. This includes both written explanations, and code. Copying from other students, books, websites, or solutions from previous versions of the class, (1) does nothing to help you learn how to program, (2) is easy for us to detect, and (3) has serious negative consequences for you, as outlined in the university’s policy on cheating and plagiarism. If, after reading the policy, you are unclear on what is acceptable, please ask the Instructors.

Accommodations for students with disabilities

If you have a disability and are registered with the Office of Disability Resources, please use their online system to notify us of your accommodations and discuss with us your needs as early in the semester as possible. We will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, consider contacting them at access@andrew.cmu.edu.

Take care of yourself

Take care of yourself. Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

All of us benefit from support during times of struggle. You are not alone. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

If the situation is life threatening, call the police:

If you have questions about this, then please let us know.