il 36-721: Statistical Graphics and Visualization, Fall 2014

Outline

Graphical displays of quantitative information take on many forms to help us understand both data and models. This course will serve to introduce the student to the most common forms of graphical displays and their uses and misuses. Students will learn both how to create these displays and how to understand them. The class will also cover some principles of visual perception and estimation. We will start with univariate and bivariate data, looking at some commonly used graphs and, after discussing their advantages/disadvantages, then turning to more sophisticated tools. We will then explore some three-dimensional tools, group structure/clustering, and projections of higher dimensional data. As time permits, the course will consider some more advanced graphical models such as statistical maps, networks, and the usage of icons. Given the recent explosion of new tools for internet graphics in R, we will spend a good deal of time focusing on web deployment of interactive graphics and how in particular we can retain their statistical properties while still conveying reasonable information.

Course Objectives:

  1. Demonstrate how to use different graphical displays to visualize a data set and its characteristics.
  2. Use principles of visual perception and estimation to generate effective graphical displays.
  3. Develop written and verbal communication skills for discussing the information presented in different graphs and their appropriateness; present graphs and visualization appropriately in a poster session.
  4. Effectively use R, a widely-used statistical package, to generate graphs/displays to visualize data.
We will combine the lecture- and lab-styles in this class. Students are expected to bring their laptop computers to every class and spend some of this class time working on the homework assignments in groups.

Course Work:

Your grade in this course will be determined by 3 factors: 2 homework assignments, due before the beginning of class (2:59 PM) on its due date; a single 5-minute presentation of a piece of code or plot type that you find useful (or one that we assign you); and a final project that will be graded in steps. Assignments will be in the form of an R Markdown file: namely, code snippets integrated with captions and other narrative. You must include your name on the file as well as within the file itself!

Grading Policy

You are encouraged to discuss homework problems with your fellow students; however, the work you submit must be your own. You must acknowledge any help received on your assignments or labs. Copied work will receive no credit and prompt notification to university authorities. Please come talk to me if there are difficulties; problems/conflicts must be discussed with at least 48 hours notice before a deadline. Each assignment will work in the following manner:
  • 1 point will be given for an assignment turned in by the deadline, with all parts completed in good faith. This does not have to be perfect, but it must be complete.
  • 1 point will be given for an assignment with "perfect" results -- that is, all the charts presented are correctly prepared, all labels properly attached, all numerical answers are correct, and so forth.
  • 1 point will be given for an assignment with "perfect" code -- that is, properly commented code blocks so that another can replicate your efforts, and text answers prepared in full to support your outputs.
  • You may re-submit your assignment up to two times at any point in the semester for re-consideration of the "perfect" points.
In this way, late assignments will be accepted but will only be eligible for 2 of the 3 total points (for a perfect assignment).
The 5 minute presentation will be worth 1 point.
The group project will be worth 8 points and graded in the following fashion:
  • 3 points: All members of the group will prepare their data analysis together in the beginning to produce preliminary outcomes and ideas. These are expected to be a group draft versions of the content in the work that will follow. Everyone will share this part of the grade.
  • 4 points: Each member of the group will produce their own work product of this display. It may be a paper/PDF writeup, a website, a slide deck, or whatever format the student thinks is the best to communicate their findings. At the end of the class session, all students in the class will vote on which individual implementation is the most effective of this work in a blinded fashion, with the "winner" of each group receiving an extra credit point for their project.
The total number of points that can be awarded in the course is 15. Final letter grades will be determined directly with this point scheme:

<667891011121314-15
RDC-CC+B-BB+A-A

Policies

Computing:
The statistical computing package we will use in this course is R, which is available on many campus computers. You may download your own copy from http://www.r-project.org. We also require that you use RStudio to augment your coding experience and write your homework assignments.

Laptop Policy:
Students are expected to be participating in class, particularly with their laptops.

Cellphones/Pagers, etc:
All cellphones, pagers, and anything else that makes noise should either be turned off or silenced during class. Texting is not allowed nor is it acceptable professional behavior. (Exceptions will be made for relevant live-tweeting at the instructor's discretion.)

Communication:
Assignments and class information will be posted on Blackboard. Help with using Blackboard is available at www.cmu.edu/blackboard/help/.

Email:
Email sent to your professor or teaching assistants should be treated as professional communication. Emails should have an appropriate greeting and ending; students should refrain from using any kind of “shortcuts”, abbreviations, acronyms, slang, etc. in the email text. Emails not meeting these standards may not be answered.
Email questions must be sent at least 24 hours before a deadline to get a timely response.

Academic Integrity:
All students are expected to comply with the CMU policy on academic integrity. This policy is online at http://www.cmu.edu/academic-integrity/

Disability Services:
If you have a disability and need special accomodations in this class, please contact the instructor. You may also want to contact the Disability Resources office at 8-2013.

Tentative Schedule

DateTopicDue
Tues 8/26Introduction; Types Of Data; Installing R
Thur 8/28Categorical Data
Learning Markdown
---
Tues 9/21-D Continuous DataHW1 out
Thur 9/41-D and 2-D Continuous Data
---
Tues 9/9MapsHW1 due, HW2 out
Thur 9/11(TBD)
---
Tues 9/16Project Data SetsPick Your Groups
Thur 9/18Generalized Linear ModelsHW2 due
---
Tues 9/23Trees and Networks
Thur 9/25Trees and Networks
---
Tues 9/30Group Projects: Initial LooksPreliminary Project Work
Thur 10/2Group Projects: Initial Looks
---
Tues 10/7High-Dimensional Data
Thur 10/9High-Dimensional Data
---
Tues 10/14Presentations
Thur 10/16Presentations