Statistical Graphics and Visualization, Spring 2007
www.stat.cmu.edu/~vventura
Instructor:
Valerie Ventura, Department of Statistics
Office: Baker Hall 229E
e-mail: vventura@stat.cmu.edu
Office Hours: Monday:
3:00-4:00pm or by appointment
http://www.stat.cmu.edu/~vventura
Teaching and lab Assistant: Erich Huang, Department of Statistics
Office: FMS 328A or FMS 320 (TA room)
e-mail:
ephuang@stat.cmu.edu
Office hour: Tuesday, 3:00-4:00pm
Lectures: Monday and Wednesday, 12:30-1:20, BH255A
Computer labs: Friday, 12:30-1:20, BH140F
Overview
Graphs are powerful tools for representing
and interpreting data. They can provide more information than
statistical
tests and are often more convincing. Graphs are often the quickest path
to
winning an argument and producing action. This course teaches the
methods
and principles which will allow you to realize the full potential of
graphics. It also includes a subsidiary focus on the aesthetics and
clarity of graphical presentation.
Course Objectives
In this course you will:
- Learn how to critically interpret graphics appearing in the
popular
press,
academic publications, and software packages.
- Learn how to choose the right graph for the point you are trying
to make
or, if necessary, how to design a new kind of graph.
- Create statistical graphics using the R software package.
- Analyze data and answer statistical questions with graphs.
- Develop an appreciation for graphical aesthetics and learn to
critique graphical presentations.
Texts
William S. Cleveland (1993) Visualizing Data. Hobart
Press.
Recommended Background Reading in
Statistics and the Use of R
John Maindonald and John Braun (2003) Data
Analysis and Graphics Using R.
An Example-Based Approach. Cambridge University Press.
Statistical Graphics
Edward R. Tufte (2002) The
Visual Display of Quantitative Information. 2nd Edition.
Graphics Press.
Howard Wainer (2005) Graphical
Discovery. A Trout in the
Milk and Other Visual Adventures. Princeton University
Press.
Schedule
The schedule is organized around data of increasing dimension:
1-D, 2-D, 3-D, and beyond.
- univariate data
- histograms, dot strips, density estimates, boxplots,
quantile-quantile
plots
- pies, bars, dotcharts
- visual perception of magnitudes
- bivariate data, time series
- scatterplots, curve fitting, line graphs
- visual perception of curves
- categorical data plots: mosaics and 2x2 table plots
- three-dimensional data
- using perspective, glyphs, and colors
- surface plots vs. contour plots
- visual perception of color
- maps
- map projections, map coloring, map smoothing
- animated maps
- hyper-variate data
- dynamic graphics, fly-throughs
- interactive graphics, brushing
- projection and slicing
Format
- The course is taught in lecture format on Monday and Wednesday
and
via hands-on practice on Fridays in a computer lab.
- Lectures will contain the material needed to complete the
homeworks and lab
assignments. There are two texts; nonetheless, ATTENDANCE and
participation in class are critical for learning.
- There will be a weekly OFFICE HOUR
where you can meet one-on-one with the instructor and a separate office
hour for the teaching assistant.
- In the computer labs, you will learn how to create statistical
graphics,
under supervision of the lab assistant.
Computer labs are mandatory. Each Friday, a LAB
ASSIGNMENT will be handed out at the beginning of the session which
must be
completed during the lab period.
- You must get the attention of the lab assistant who will check
your
results and give you 50% credit for the lab.
- Following the lab, you will have until midnight to submit a
"polished" version of your answers electronically to the Lab
Assistant. This version should include
sentences providing the interpretation of the results. The
remaining 50% of the lab grade will be awarded for this electronically
submitted set of answers.
-
Computer labs will use a free software package called R, which is similar to
S-plus.
Unlike Data Desk or Minitab, R is a full-fledged programming language,
and
you can use it to perform what is, for all practical purposes, an
unlimited set of operations. You can download personal versions
of R for Windows, Linux or Mac OS10 operating systems.
- HOMEWORK will be assigned weekly and coordinated with the
lectures and labs. The purpose of these assignments is to
improve your understanding of the methods and their
results. The Review Quiz from Lecture 1 will count as a
homework assignment.
The homework will generally be due Tuesdays by midnight.
Homework will involve answering questions
related to the lectures and creating graphics similar to what you
practiced
in lab.
It will be posted on the web
and can be handed in or emailed to us. Reading and homework should take
about six hours per week.
- There will be two MIDSEMESTER EXAMINATIONS:
- The first will be in-class on Monday February 19.
- The second will be a lab exam on Friday April 27.
- The FINAL PROJECT for the course will be
due during the final exam period.
Grading
Your final grade will be based on:
- Homework: 25%
- Labs: 25%
- Midsemester Exams 20%
- Final project: 30%
Each homework assignment will be worth 100 points. These points will be
divided approximately equally among each of the parts of the
assignment. The Review Quiz will count as a single homework
assignment.
The lowest homework grade will be dropped except if it is the last
assignment of the semester which is mandatory. The remaining
homework
grades will be used to compute the homework grade. The same procedure
will be used for computer lab grades.
Extensions:
- The standard extensions (medical, university event, or religious
holiday) must be accompanied
by an
official form as described in the student handbook.
All work and computer code must be your own.
Sharing code or answers will result in zero credit and a letter to your
dean.
See the CMU Student Handbook
on Cheating
and Plagiarism.