# Indexing and Iteration

Wednesday September 5, 2018

# Last week: R basics

• We write programs by composing functions to manipulate data
• The basic data types let us represent Booleans, numbers, and characters
• Data structures let us group together related values
• Vectors let us group values of the same type
• Arrays add multi-dimensional structure to vectors
• Matrices act like you’d hope they would
• Lists let us combine different types of data
• Data frames are hybrids of matrices and lists, allowing each column to have a different data type

Indexing

# How R indexes vectors, matrices, lists

There are 3 ways to index a vector, matrix, data frame, or list in R:

1. Using explicit integer indices (or negative integers)
2. Using a Boolean vector (often created on-the-fly)
3. Using names

Note: in general, we have to set the names ourselves. Use names() for vectors and lists, and rownames(), colnames() for matrices and data frames

# Indexing with integers

The most transparent way. Can index with an integer, or integer vector (or negative integer, or negative integer vector). Examples for vectors:

set.seed(33) # For reproducibility
x.vec = rnorm(6) # Generate a vector of 6 random standard normals
x.vec
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
x.vec[3] # Third element
## [1] 1.010539
x.vec[c(3,4,5)] # Third through fifth elements
## [1]  1.0105390 -0.1582624 -2.1566375
x.vec[3:5] # Same, but written more succintly
## [1]  1.0105390 -0.1582624 -2.1566375
x.vec[c(3,5,4)] # Third, fifth, then fourth element
## [1]  1.0105390 -2.1566375 -0.1582624
x.vec[-3] # All but third element
## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750  0.49864683
x.vec[c(-3,-4,-5)] # All but third through fifth element
## [1] -0.13592452 -0.04079697  0.49864683
x.vec[-c(3,4,5)] # Same
## [1] -0.13592452 -0.04079697  0.49864683
x.vec[-(3:5)] # Same, more succint (note the parantheses!)
## [1] -0.13592452 -0.04079697  0.49864683

Examples for matrices:

x.mat = matrix(x.vec, 3, 2) # Fill a 3 x 2 matrix with those same 6 normals,
# column major order
x.mat
##             [,1]       [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
## [3,]  1.01053901  0.4986468
x.mat[2,2] # Element in 2nd row, 2nd column
## [1] -2.156638
x.mat[5] # Same (note this is using column major order)
## [1] -2.156638
x.mat[2,] # Second row
## [1] -0.04079697 -2.15663750
x.mat[1:2,] # First and second rows
##             [,1]       [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
x.mat[,1] # First column
## [1] -0.13592452 -0.04079697  1.01053901
x.mat[,-1] # All but first column 
## [1] -0.1582624 -2.1566375  0.4986468

Examples for lists:

x.list = list(x.vec, letters, sample(c(TRUE,FALSE),size=4,replace=TRUE))
x.list
## [[1]]
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
##
## [[2]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[3]]
## [1]  TRUE  TRUE FALSE FALSE
x.list[[3]] # Third element of list
## [1]  TRUE  TRUE FALSE FALSE
x.list[3] # Third element of list, kept as a list
## [[1]]
## [1]  TRUE  TRUE FALSE FALSE
x.list[1:2] # First and second elements of list (note the single brackets!)
## [[1]]
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
##
## [[2]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
x.list[-1] # All but first element of list
## [[1]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[2]]
## [1]  TRUE  TRUE FALSE FALSE

Note: you will get errors if you try to do either of above commands with double brackets [[ ]]

# Indexing with booleans

This might appear a bit more tricky at first but is very useful, especially when we define a boolean vector “on-the-fly”. Examples for vectors:

x.vec[c(F,F,T,F,F,F)] # Third element
## [1] 1.010539
x.vec[c(T,T,F,T,T,T)] # All but third element
## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750  0.49864683
pos.vec = x.vec > 0 # Boolean vector indicating whether each element is positive
pos.vec
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE
x.vec[pos.vec] # Pull out only positive elements
## [1] 1.0105390 0.4986468
x.vec[x.vec > 0] # Same, but more succint (this is done "on-the-fly")
## [1] 1.0105390 0.4986468

Works the same way for lists; in lab, we’ll explore logical indexing for matrices

# Indexing with names

Indexing with names can also be quite useful. We must have names in the first place; with vectors or lists, use names() to set the names

names(x.list) = c("normals", "letters", "bools")
x.list[["letters"]] # "letters" (third) element 
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
x.list$letters # Same, just using different notation ## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" ## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z" x.list[c("normals","bools")] ##$normals
## [1] -0.13592452 -0.04079697  1.01053901 -0.15826244 -2.15663750  0.49864683
##
## \$bools
## [1]  TRUE  TRUE FALSE FALSE
• We will see indexing by names being especially useful when we talk more about data frames, shortly
• In lab, we’ll practice using rownames() and colnames() and named indexing with matrices

# Part II

Control flow (if, else, etc.)

# Control flow

Summary of the control flow tools in R:

• if(), else if(), else: standard conditionals
• ifelse(): conditional function that vectorizes nicely
• switch(): handy for deciding between several options

# if() and else

Use if() and else to decide whether to evaluate one block of code or another, depending on a condition

x = 0.5

if (x >= 0) {
x
} else {
-x
}
## [1] 0.5
• Condition in if() needs to give one TRUE or FALSE value
• Note that the else statement is optional
• Single line actions don’t need braces, i.e., could shorten above to if (x >= 0) x else -x

# elseif()

We can use elseif() arbitrarily many times following an if() statement

x = -2

if (x^2 < 1) {
x^2
} else if (x >= 1) {
2*x-1
} else {
-2*x-1
}
## [1] 3
• Each elseif() only gets considered if the conditions above it were not TRUE
• The else statement gets evaluated if none of the above conditions were TRUE
• Note again that the else statement is optional

# Quick decision making

In the ifelse() function we specify a condition, then a value if the condition holds, and a value if the condition fails

ifelse(x > 0, x, -x)
## [1] 2

One advantage of ifelse() is that it vectorizes nicely; we’ll see this on the lab

# Deciding between many options

Instead of an if() statement followed by elseif() statements (and perhaps a final else), we can use switch(). We pass a variable to select on, then a value for each option

type.of.summary = "mode"

switch(type.of.summary,
mean=mean(x.vec),
median=median(x.vec),
histogram=hist(x.vec),
"I don't understand")
## [1] "I don't understand"
• Here we are expecting type.of.summary to be a string, either “mean”, “median”, or “histogram”; we specify what to do for each
• The last passed argument has no name, and it serves as the else clause
• Try changing type.of.summary above and see what happens

# Reminder: Boolean operators

Remember our standard Boolean operators, & and |. These combine terms elementwise

u.vec = runif(10, -1, 1)
u.vec
##  [1]  0.54949775 -0.22561403 -0.72846986  0.80071515  0.13290531
##  [6] -0.91453168 -0.02336149 -0.29755356  0.93932343  0.57915778
u.vec[-0.5 <= u.vec & u.vec <= 0.5] = 999
u.vec
##  [1]   0.5494977 999.0000000  -0.7284699   0.8007152 999.0000000
##  [6]  -0.9145317 999.0000000 999.0000000   0.9393234   0.5791578

# Lazy Boolean operators

In contrast to the standard Boolean operators, && and || give just a single Boolean, “lazily”: meaning we terminate evaluating the expression ASAP

(0 > 0) && all(matrix(0,2,2) == matrix(0,3,3)) 
## [1] FALSE
(0 > 0) && (ThisVariableIsNotDefined == 0) 
## [1] FALSE
• Note R never evaluates the expression on the right in each line (each would throw an error)
• In control flow, we typically just want one Boolean
• Rule of thumb: use & and | for indexing or subsetting, and && and || for conditionals

Iteration

# Iteration

Computers: good at applying rigid rules over and over again. Humans: not so good at this. Iteration is at the heart of programming

Summary of the iteration methods in R:

• for(), while() loops: standard loop constructs
• Vectorization: use it whenever possible! Often faster and simpler
• apply() family of functions: alternative to for() loop, these are built-in R functions
• **ply() family of functions: another alternative, very useful, from the plyr package

# for()

A for() loop increments a counter variable along a vector. It repeatedly runs a code block, called the body of the loop, with the counter set at its current value, until it runs through the vector

n = 10
log.vec = vector(length=n, mode="numeric")
for (i in 1:n) {
log.vec[i] = log(i)
}
log.vec
##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
##  [8] 2.0794415 2.1972246 2.3025851

Here i is the counter and the vector we are iterating over is 1:n. The body is the code in between the braces

# Breaking from the loop

We can break out of a for() loop early (before the counter has been iterated over the whole vector), using break

n = 10
log.vec = vector(length=n, mode="numeric")
for (i in 1:n) {
if (log(i) > 2) {
cat("I'm outta here. I don't like numbers bigger than 2\n")
break
}
log.vec[i] = log(i)
}
## I'm outta here. I don't like numbers bigger than 2
log.vec
##  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
##  [8] 0.0000000 0.0000000 0.0000000

# Variations on standard for() loops

Many different variations on standard for() are possible. Two common ones:

• Nonnumeric counters: counter variable always gets iterated over a vector, but it doesn’t have to be numeric
• Nested loops: body of the for() loop can contain another for() loop (or several others)
for (str in c("Prof", "Ryan", "Tibs")) {
cat(paste(str, "declined to comment\n"))
}
## Prof declined to comment
## Ryan declined to comment
## Tibs declined to comment
for (i in 1:4) {
for (j in 1:i^2) {
cat(paste(j,""))
}
cat("\n")
}
## 1
## 1 2 3 4
## 1 2 3 4 5 6 7 8 9
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

# while()

A while() loop repeatedly runs a code block, again called the body, until some condition is no longer true

i = 1
log.vec = c()
while (log(i) <= 2) {
log.vec = c(log.vec, log(i))
i = i+1
}
log.vec
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101

# for() versus while()

• for() is better when the number of times to repeat (values to iterate over) is clear in advance

• while() is better when you can recognize when to stop once you’re there, even if you can’t guess it to begin with

• while() is more general, in that every for() could be replaced with a while() (but not vice versa)

# while(TRUE) or repeat

while(TRUE) and repeat: both do the same thing, just repeat the body indefinitely, until something causes the flow to break. Example (try running in your console):

repeat {
ans = readline("Who is the best Professor of Statistics at CMU? ")
if (ans == "Tibs" || ans == "Tibshirani" || ans == "Ryan") {
cat("Yes! You get an 'A'.")
break
}
else {
}
}

# Avoiding explicit iteration

• Warning: some people have a tendency to overuse for() and while() loops in R
• They aren’t always needed. Remember vectorization should be used whenever possible
• We’ll emphasize this on the lab, and try to hit upon it throughout the course

# Summary

• Three ways to index vectors, matrices, data frames, lists: integers, Booleans, names
• Boolean on-the-fly indexing can be very useful
• Named indexing will be especially useful for data frames
• Indexing lists can be a bit tricky (beware of the difference between [ ] and [[ ]])
• if(), elseif(), else: standard conditionals
• ifelse(): shortcut for using if() and else in combination
• switch(): shortcut for using if(), elseif(), and else in combination
• for(), while(), repeat: standard loop constructs
• Don’t overuse explicit for() loops, vectorization is your friend!
• apply() and **ply(): can also be very useful (we’ll see them later)