Statistical Computing, 36-350

Wednesday September 5, 2018

- We write programs by composing functions to manipulate data
- The basic data types let us represent Booleans, numbers, and characters
- Data structures let us group together related values
- Vectors let us group values of the same type
- Arrays add multi-dimensional structure to vectors
- Matrices act like you’d hope they would
- Lists let us combine different types of data
- Data frames are hybrids of matrices and lists, allowing each column to have a different data type

*Indexing*

There are 3 ways to index a vector, matrix, data frame, or list in R:

- Using explicit integer indices (or negative integers)
- Using a Boolean vector (often created on-the-fly)
- Using names

Note: in general, we have to set the names ourselves. Use `names()`

for vectors and lists, and `rownames()`

, `colnames()`

for matrices and data frames

The most transparent way. Can index with an integer, or integer vector (or negative integer, or negative integer vector). Examples for vectors:

```
set.seed(33) # For reproducibility
x.vec = rnorm(6) # Generate a vector of 6 random standard normals
x.vec
```

`## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683`

`x.vec[3] # Third element`

`## [1] 1.010539`

`x.vec[c(3,4,5)] # Third through fifth elements`

`## [1] 1.0105390 -0.1582624 -2.1566375`

`x.vec[3:5] # Same, but written more succintly`

`## [1] 1.0105390 -0.1582624 -2.1566375`

`x.vec[c(3,5,4)] # Third, fifth, then fourth element`

`## [1] 1.0105390 -2.1566375 -0.1582624`

`x.vec[-3] # All but third element`

`## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750 0.49864683`

`x.vec[c(-3,-4,-5)] # All but third through fifth element`

`## [1] -0.13592452 -0.04079697 0.49864683`

`x.vec[-c(3,4,5)] # Same`

`## [1] -0.13592452 -0.04079697 0.49864683`

`x.vec[-(3:5)] # Same, more succint (note the parantheses!)`

`## [1] -0.13592452 -0.04079697 0.49864683`

Examples for matrices:

```
x.mat = matrix(x.vec, 3, 2) # Fill a 3 x 2 matrix with those same 6 normals,
# column major order
x.mat
```

```
## [,1] [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
## [3,] 1.01053901 0.4986468
```

`x.mat[2,2] # Element in 2nd row, 2nd column`

`## [1] -2.156638`

`x.mat[5] # Same (note this is using column major order)`

`## [1] -2.156638`

`x.mat[2,] # Second row`

`## [1] -0.04079697 -2.15663750`

`x.mat[1:2,] # First and second rows`

```
## [,1] [,2]
## [1,] -0.13592452 -0.1582624
## [2,] -0.04079697 -2.1566375
```

`x.mat[,1] # First column`

`## [1] -0.13592452 -0.04079697 1.01053901`

`x.mat[,-1] # All but first column `

`## [1] -0.1582624 -2.1566375 0.4986468`

Examples for lists:

```
x.list = list(x.vec, letters, sample(c(TRUE,FALSE),size=4,replace=TRUE))
x.list
```

```
## [[1]]
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## [[2]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[3]]
## [1] TRUE TRUE FALSE FALSE
```

`x.list[[3]] # Third element of list`

`## [1] TRUE TRUE FALSE FALSE`

`x.list[3] # Third element of list, kept as a list`

```
## [[1]]
## [1] TRUE TRUE FALSE FALSE
```

`x.list[1:2] # First and second elements of list (note the single brackets!)`

```
## [[1]]
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## [[2]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
```

`x.list[-1] # All but first element of list`

```
## [[1]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
##
## [[2]]
## [1] TRUE TRUE FALSE FALSE
```

Note: you will get errors if you try to do either of above commands with double brackets `[[ ]]`

This might appear a bit more tricky at first but is *very useful*, especially when we define a boolean vector “on-the-fly”. Examples for vectors:

`x.vec[c(F,F,T,F,F,F)] # Third element`

`## [1] 1.010539`

`x.vec[c(T,T,F,T,T,T)] # All but third element`

`## [1] -0.13592452 -0.04079697 -0.15826244 -2.15663750 0.49864683`

```
pos.vec = x.vec > 0 # Boolean vector indicating whether each element is positive
pos.vec
```

`## [1] FALSE FALSE TRUE FALSE FALSE TRUE`

`x.vec[pos.vec] # Pull out only positive elements`

`## [1] 1.0105390 0.4986468`

`x.vec[x.vec > 0] # Same, but more succint (this is done "on-the-fly")`

`## [1] 1.0105390 0.4986468`

Works the same way for lists; in lab, we’ll explore logical indexing for matrices

Indexing with names can also be quite useful. We must have names in the first place; with vectors or lists, use `names()`

to set the names

```
names(x.list) = c("normals", "letters", "bools")
x.list[["letters"]] # "letters" (third) element
```

```
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
```

`x.list$letters # Same, just using different notation`

```
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"
```

`x.list[c("normals","bools")]`

```
## $normals
## [1] -0.13592452 -0.04079697 1.01053901 -0.15826244 -2.15663750 0.49864683
##
## $bools
## [1] TRUE TRUE FALSE FALSE
```

- We will see indexing by names being especially useful when we talk more about data frames, shortly

- In lab, we’ll practice using
`rownames()`

and`colnames()`

and named indexing with matrices

*Control flow (if, else, etc.)*

Summary of the control flow tools in R:

`if()`

,`else if()`

,`else`

: standard conditionals`ifelse()`

: conditional function that vectorizes nicely`switch()`

: handy for deciding between several options

`if()`

and `else`

Use `if()`

and `else`

to decide whether to evaluate one block of code or another, depending on a condition

```
x = 0.5
if (x >= 0) {
x
} else {
-x
}
```

`## [1] 0.5`

- Condition in
`if()`

needs to give one`TRUE`

or`FALSE`

value - Note that the
`else`

statement is optional - Single line actions don’t need braces, i.e., could shorten above to
`if (x >= 0) x else -x`

`elseif()`

We can use `elseif()`

arbitrarily many times following an `if()`

statement

```
x = -2
if (x^2 < 1) {
x^2
} else if (x >= 1) {
2*x-1
} else {
-2*x-1
}
```

`## [1] 3`

- Each
`elseif()`

only gets considered if the conditions above it were not`TRUE`

- The
`else`

statement gets evaluated if none of the above conditions were`TRUE`

- Note again that the
`else`

statement is optional

In the `ifelse()`

function we specify a condition, then a value if the condition holds, and a value if the condition fails

`ifelse(x > 0, x, -x)`

`## [1] 2`

One advantage of `ifelse()`

is that it vectorizes nicely; we’ll see this on the lab

Instead of an `if()`

statement followed by `elseif()`

statements (and perhaps a final `else`

), we can use `switch()`

. We pass a variable to select on, then a value for each option

```
type.of.summary = "mode"
switch(type.of.summary,
mean=mean(x.vec),
median=median(x.vec),
histogram=hist(x.vec),
"I don't understand")
```

`## [1] "I don't understand"`

- Here we are expecting
`type.of.summary`

to be a string, either “mean”, “median”, or “histogram”; we specify what to do for each - The last passed argument has no name, and it serves as the
`else`

clause - Try changing
`type.of.summary`

above and see what happens

Remember our standard Boolean operators, `&`

and `|`

. These combine terms elementwise

```
u.vec = runif(10, -1, 1)
u.vec
```

```
## [1] 0.54949775 -0.22561403 -0.72846986 0.80071515 0.13290531
## [6] -0.91453168 -0.02336149 -0.29755356 0.93932343 0.57915778
```

```
u.vec[-0.5 <= u.vec & u.vec <= 0.5] = 999
u.vec
```

```
## [1] 0.5494977 999.0000000 -0.7284699 0.8007152 999.0000000
## [6] -0.9145317 999.0000000 999.0000000 0.9393234 0.5791578
```

In contrast to the standard Boolean operators, `&&`

and `||`

give just a single Boolean, “lazily”: meaning we terminate evaluating the expression ASAP

`(0 > 0) && all(matrix(0,2,2) == matrix(0,3,3)) `

`## [1] FALSE`

`(0 > 0) && (ThisVariableIsNotDefined == 0) `

`## [1] FALSE`

- Note R
*never*evaluates the expression on the right in each line (each would throw an error) - In control flow, we typically just want one Boolean
- Rule of thumb: use
`&`

and`|`

for indexing or subsetting, and`&&`

and`||`

for conditionals

*Iteration*

Computers: good at applying rigid rules over and over again. Humans: not so good at this. Iteration is at the heart of programming

Summary of the iteration methods in R:

`for()`

,`while()`

loops: standard loop constructs- Vectorization: use it whenever possible! Often faster and simpler
`apply()`

family of functions: alternative to`for()`

loop, these are built-in R functions`**ply()`

family of functions: another alternative, very useful, from the`plyr`

package

`for()`

A `for()`

loop increments a **counter** variable along a vector. It repeatedly runs a code block, called the **body** of the loop, with the counter set at its current value, until it runs through the vector

```
n = 10
log.vec = vector(length=n, mode="numeric")
for (i in 1:n) {
log.vec[i] = log(i)
}
log.vec
```

```
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 2.0794415 2.1972246 2.3025851
```

Here `i`

is the counter and the vector we are iterating over is `1:n`

. The body is the code in between the braces

We can **break** out of a `for()`

loop early (before the counter has been iterated over the whole vector), using `break`

```
n = 10
log.vec = vector(length=n, mode="numeric")
for (i in 1:n) {
if (log(i) > 2) {
cat("I'm outta here. I don't like numbers bigger than 2\n")
break
}
log.vec[i] = log(i)
}
```

`## I'm outta here. I don't like numbers bigger than 2`

`log.vec`

```
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 0.0000000 0.0000000 0.0000000
```

`for()`

loopsMany different variations on standard `for()`

are possible. Two common ones:

- Nonnumeric counters: counter variable always gets iterated over a vector, but it doesn’t have to be numeric
- Nested loops: body of the
`for()`

loop can contain another`for()`

loop (or several others)

```
for (str in c("Prof", "Ryan", "Tibs")) {
cat(paste(str, "declined to comment\n"))
}
```

```
## Prof declined to comment
## Ryan declined to comment
## Tibs declined to comment
```

```
for (i in 1:4) {
for (j in 1:i^2) {
cat(paste(j,""))
}
cat("\n")
}
```

```
## 1
## 1 2 3 4
## 1 2 3 4 5 6 7 8 9
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
```

`while()`

A `while()`

loop repeatedly runs a code block, again called the **body**, until some condition is no longer true

```
i = 1
log.vec = c()
while (log(i) <= 2) {
log.vec = c(log.vec, log(i))
i = i+1
}
log.vec
```

`## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101`

`for()`

versus `while()`

`for()`

is better when the number of times to repeat (values to iterate over) is clear in advance`while()`

is better when you can recognize when to stop once you’re there, even if you can’t guess it to begin with`while()`

is more general, in that every`for()`

could be replaced with a`while()`

(but not vice versa)

`while(TRUE)`

or `repeat`

`while(TRUE)`

and `repeat`

: both do the same thing, just repeat the body indefinitely, until something causes the flow to break. Example (try running in your console):

```
repeat {
ans = readline("Who is the best Professor of Statistics at CMU? ")
if (ans == "Tibs" || ans == "Tibshirani" || ans == "Ryan") {
cat("Yes! You get an 'A'.")
break
}
else {
cat("Wrong answer!\n")
}
}
```

- Warning: some people have a tendency to
**overuse**`for()`

and`while()`

loops in R - They aren’t always needed. Remember vectorization should be used whenever possible
- We’ll emphasize this on the lab, and try to hit upon it throughout the course

- Three ways to index vectors, matrices, data frames, lists: integers, Booleans, names
- Boolean on-the-fly indexing can be very useful
- Named indexing will be especially useful for data frames
- Indexing lists can be a bit tricky (beware of the difference between
`[ ]`

and`[[ ]]`

) `if()`

,`elseif()`

,`else`

: standard conditionals`ifelse()`

: shortcut for using`if()`

and`else`

in combination`switch()`

: shortcut for using`if()`

,`elseif()`

, and`else`

in combination`for()`

,`while()`

,`repeat`

: standard loop constructs- Don’t overuse explicit
`for()`

loops, vectorization is your friend! `apply()`

and`**ply()`

: can also be very useful (we’ll see them later)