--- title: "Indexing and Iteration" author: "Statistical Computing, 36-350" date: "Tuesday September 7, 2021" --- Last week: R basics === - We write programs by composing functions to manipulate data - The basic data types let us represent Booleans, numbers, and characters - Data structures let us group together related values - Vectors let us group values of the same type - Arrays add multi-dimensional structure to vectors - Matrices act like you'd hope they would - Lists let us combine different types of data - Data frames are hybrids of matrices and lists, allowing each column to have a different data type Part I === *Indexing* How R indexes vectors, matrices, lists === There are 3 ways to index a vector, matrix, data frame, or list in R: 1. Using explicit integer indices (or negative integers) 2. Using a Boolean vector (often created on-the-fly) 3. Using names Note: in general, we have to set the names ourselves. Use `names()` for vectors and lists, and `rownames()`, `colnames()` for matrices and data frames Indexing with integers === The most transparent way. Can index with an integer, or integer vector (or negative integer, or negative integer vector). Examples for vectors: ```{r} set.seed(33) # For reproducibility x.vec = rnorm(6) # Generate a vector of 6 random standard normals x.vec x.vec # Third element x.vec[c(3,4,5)] # Third through fifth elements x.vec[3:5] # Same, but written more succintly x.vec[c(3,5,4)] # Third, fifth, then fourth element ``` --- ```{r} x.vec[-3] # All but third element x.vec[c(-3,-4,-5)] # All but third through fifth element x.vec[-c(3,4,5)] # Same x.vec[-(3:5)] # Same, more succint (note the parantheses!) ``` --- Examples for matrices: ```{r} x.mat = matrix(x.vec, 3, 2) # Fill a 3 x 2 matrix with those same 6 normals, # column major order x.mat x.mat[2,2] # Element in 2nd row, 2nd column x.mat # Same (note this is using column major order) ``` --- ```{r} x.mat[2,] # Second row x.mat[1:2,] # First and second rows x.mat[,1] # First column x.mat[,-1] # All but first column ``` --- Examples for lists: ```{r} x.list = list(x.vec, letters, sample(c(TRUE,FALSE),size=4,replace=TRUE)) x.list x.list[] # Third element of list x.list # Third element of list, kept as a list ``` --- ```{r} x.list[1:2] # First and second elements of list (note the single brackets!) x.list[-1] # All but first element of list ``` Note: you will get errors if you try to do either of above commands with double brackets `[[ ]]` Indexing with booleans === This might appear a bit more tricky at first but is *very useful*, especially when we define a boolean vector "on-the-fly". Examples for vectors: ```{r} x.vec[c(F,F,T,F,F,F)] # Third element x.vec[c(T,T,F,T,T,T)] # All but third element pos.vec = x.vec > 0 # Boolean vector indicating whether each element is positive pos.vec x.vec[pos.vec] # Pull out only positive elements x.vec[x.vec > 0] # Same, but more succint (this is done "on-the-fly") ``` Works the same way for lists; in lab, we'll explore logical indexing for matrices Indexing with names === Indexing with names can also be quite useful. We must have names in the first place; with vectors or lists, use `names()` to set the names ```{r} names(x.list) = c("normals", "letters", "bools") x.list[["letters"]] # "letters" (third) element x.list\$letters # Same, just using different notation x.list[c("normals","bools")] ``` - We will see indexing by names being especially useful when we talk more about data frames, shortly - In lab, we'll practice using `rownames()` and `colnames()` and named indexing with matrices Part II === *Control flow (if, else, etc.)* Control flow === Summary of the control flow tools in R: - `if()`, `else if()`, `else`: standard conditionals - `ifelse()`: conditional function that vectorizes nicely - `switch()`: handy for deciding between several options `if()` and `else` === Use `if()` and `else` to decide whether to evaluate one block of code or another, depending on a condition ```{r} x = 0.5 if (x >= 0) { x } else { -x } ``` - Condition in `if()` needs to give one `TRUE` or `FALSE` value - Note that the `else` statement is optional - Single line actions don't need braces, i.e., could shorten above to `if (x >= 0) x else -x` `else if()` === We can use `else if()` arbitrarily many times following an `if()` statement ```{r} x = -2 if (x^2 < 1) { x^2 } else if (x >= 1) { 2*x-1 } else { -2*x+1 } ``` - Each `else if()` only gets considered if the conditions above it were not `TRUE` - The `else` statement gets evaluated if none of the above conditions were `TRUE` - Note again that the `else` statement is optional Quick decision making === In the `ifelse()` function we specify a condition, then a value if the condition holds, and a value if the condition fails ```{r} ifelse(x > 0, x, -x) ``` One advantage of `ifelse()` is that it vectorizes nicely; we'll see this on the lab Deciding between many options === Instead of an `if()` statement followed by `elseif()` statements (and perhaps a final `else`), we can use `switch()`. We pass a variable to select on, then a value for each option ```{r} type.of.summary = "mode" switch(type.of.summary, mean=mean(x.vec), median=median(x.vec), histogram=hist(x.vec), "I don't understand") ``` - Here we are expecting `type.of.summary` to be a string, either "mean", "median", or "histogram"; we specify what to do for each - The last passed argument has no name, and it serves as the `else` clause - Try changing `type.of.summary` above and see what happens Reminder: Boolean operators === Remember our standard Boolean operators, `&` and `|`. These combine terms elementwise ```{r} u.vec = runif(10, -1, 1) u.vec u.vec[-0.5 <= u.vec & u.vec <= 0.5] = 999 u.vec ``` Lazy Boolean operators === In contrast to the standard Boolean operators, `&&` and `||` give just a single Boolean, "lazily": meaning we terminate evaluating the expression ASAP ```{r} (0 > 0) && all(matrix(0,2,2) == matrix(0,3,3)) (0 > 0) && (ThisVariableIsNotDefined == 0) ``` - Note R *never* evaluates the expression on the right in each line (each would throw an error) - In control flow, we typically just want one Boolean - Rule of thumb: use `&` and `|` for indexing or subsetting, and `&&` and `||` for conditionals Part III === *Iteration* Iteration === Computers: good at applying rigid rules over and over again. Humans: not so good at this. Iteration is at the heart of programming Summary of the iteration methods in R: - `for()`, `while()` loops: standard loop constructs - Vectorization: use it whenever possible! Often faster and simpler - The apply family of functions: alternative to `for()` loop, these are base R functions - The map family of functions: another alternative, very useful, from the `purrr` package `for()` === A `for()` loop increments a **counter** variable along a vector. It repeatedly runs a code block, called the **body** of the loop, with the counter set at its current value, until it runs through the vector ```{r} n = 10 log.vec = vector(length=n, mode="numeric") for (i in 1:n) { log.vec[i] = log(i) } log.vec ``` Here `i` is the counter and the vector we are iterating over is `1:n`. The body is the code in between the braces Breaking from the loop === We can **break** out of a `for()` loop early (before the counter has been iterated over the whole vector), using `break` ```{r} n = 10 log.vec = vector(length=n, mode="numeric") for (i in 1:n) { if (log(i) > 2) { cat("I'm outta here. I don't like numbers bigger than 2\n") break } log.vec[i] = log(i) } log.vec ``` Variations on standard `for()` loops === Many different variations on standard `for()` are possible. Two common ones: - Nonnumeric counters: counter variable always gets iterated over a vector, but it doesn't have to be numeric - Nested loops: body of the `for()` loop can contain another `for()` loop (or several others) ```{r} for (str in c("Prof", "Ryan", "Tibs")) { cat(paste(str, "declined to comment\n")) } for (i in 1:4) { for (j in 1:i^2) { cat(paste(j,"")) } cat("\n") } ``` `while()` === A `while()` loop repeatedly runs a code block, again called the **body**, until some condition is no longer true ```{r} i = 1 log.vec = c() while (log(i) <= 2) { log.vec = c(log.vec, log(i)) i = i+1 } log.vec ``` `for()` versus `while()` === - `for()` is better when the number of times to repeat (values to iterate over) is clear in advance - `while()` is better when you can recognize when to stop once you're there, even if you can't guess it to begin with - `while()` is more general, in that every `for()` could be replaced with a `while()` (but not vice versa) `while(TRUE)` or `repeat` === `while(TRUE)` and `repeat`: both do the same thing, just repeat the body indefinitely, until something causes the flow to break. Example (try running in your console): ```{r, eval=FALSE} repeat { ans = readline("Who is the best Professor of Statistics at CMU? ") if (ans == "Tibs" || ans == "Tibshirani" || ans == "Ryan") { cat("Yes! You get an 'A'.") break } else { cat("Wrong answer!\n") } } ``` Avoiding explicit iteration === - Warning: some people have a tendency to **overuse** `for()` and `while()` loops in R - They aren't always needed. Remember vectorization should be used whenever possible - We'll emphasize this on the lab, and try to hit upon it throughout the course Summary === - Three ways to index vectors, matrices, data frames, lists: integers, Booleans, names - Boolean on-the-fly indexing can be very useful - Named indexing will be especially useful for data frames - Indexing lists can be a bit tricky (beware of the difference between `[ ]` and `[[ ]]`) - `if()`, `elseif()`, `else`: standard conditionals - `ifelse()`: shortcut for using `if()` and `else` in combination - `switch()`: shortcut for using `if()`, `elseif()`, and `else` in combination - `for()`, `while()`, `repeat`: standard loop constructs - Don't overuse explicit `for()` loops, vectorization is your friend! - `apply()` and `**ply()`: can also be very useful (we'll see them later)