Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Sunday 11:59pm, this week.

This week’s agenda: basic indexing, with a focus on matrices; some more basic plotting; vectorization; using for() loops.

Back to some R basics

x.list = list(rnorm(6), letters, sample(c(TRUE,FALSE),size=4,replace=TRUE))

Prostate cancer data set

We’re going to look at a data set on 97 men who have prostate cancer (from the book The Elements of Statistical Learning). There are 9 variables measured on these 97 men:

  1. lpsa: log PSA score
  2. lcavol: log cancer volume
  3. lweight: log prostate weight
  4. age: age of patient
  5. lbph: log of the amount of benign prostatic hyperplasia
  6. svi: seminal vesicle invasion
  7. lcp: log of capsular penetration
  8. gleason: Gleason score
  9. pgg45: percent of Gleason scores 4 or 5

To load this prostate cancer data set into your R session, and store it as a matrix pros.dat:

pros.dat =
  as.matrix(read.table("http://www.stat.cmu.edu/~ryantibs/statcomp/data/pros.dat"))

Basic indexing and calculations

Exploratory data analysis with plots

A bit of Boolean indexing never hurt anyone

Computing standard deviations using iteration

pros.dat.svi.sd = vector(length=ncol(pros.dat))
i = 1
pros.dat.svi.sd.master = apply(pros.dat.svi, 2, sd)
pros.dat.no.svi.sd.master = apply(pros.dat.no.svi, 2, sd)

Computing t-tests using vectorization

My plot is at your command (optional)

repeat {
  ans = readline("What variable do you want to plot? ")
  if (ans %in% colnames(pros.dat)) {
    hist(pros.dat[,ans], main=paste("Histogram of",ans), xlab=ans)
  }
  else if (ans == "quit") break
  else cat("Oops! That's not a variable in my data set.\n")
}