Previously…

Topics we got to earlier in the course:

  • How to make draws from, and use random distributions
  • Writing functions with known run-times and verifiable results
  • Why not to use for loops

Today…

  • Writing simulations using random number generation
  • From the jackknife to the bootstrap
  • Processes that have memory
  • Finally, for loops!

[dpqr]unif, etc.

Recall how we drew from various distributions: runif(): draws from the uniform distribution (others follow)

Home-baked discrete distributions: Use sample()

population.values <- 1:3
probs <- c(5,2,3)
my.draw <- sample (population.values, 100000, probs, replace=TRUE)
table(my.draw)
## my.draw
##     1     2     3 
## 50227 20027 29746

Permutations with sample()

sample() is powerful – it works on any object that has a defined length(). Permutations:

sample(5)
## [1] 4 5 1 3 2
sample(1:6)
## [1] 1 2 4 6 5 3
replicate(3,sample(c("Curly","Larry","Moe","Shemp")))
##      [,1]    [,2]    [,3]   
## [1,] "Larry" "Curly" "Larry"
## [2,] "Curly" "Moe"   "Curly"
## [3,] "Shemp" "Larry" "Shemp"
## [4,] "Moe"   "Shemp" "Moe"
sample(list("A",3,sum))
## [[1]]
## [1] "A"
## 
## [[2]]
## [1] 3
## 
## [[3]]
## function (..., na.rm = FALSE)  .Primitive("sum")

Resampling with sample()

Resampling from any existing distribution gets us the bootstrap family of estimators:

bootstrap.resample <- function (object) sample (object, length(object), replace=TRUE)
replicate(5, bootstrap.resample (6:10))
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    8    8    6    6    6
## [2,]    9    6   10    7    7
## [3,]    7    7   10   10    8
## [4,]    9   10    6    6    7
## [5,]    8    7    9   10    6

Recall: the jackknife worked by removing one point from the sample and recalculating the statistic of interest. Here we resample the same length with replacement.

Bootstrap test

The 2-sample t-test checks for differences in means according to a known null distribution. Let's resample and generate the sampling distribution under the bootstrap assumption:

library(MASS)
diff.in.means <- function(df) {
  mean(df[df$Sex=="M","Hwt"]) - mean(df[df$Sex=="F","Hwt"])
}
resample.diffs <- replicate(1000, diff.in.means(cats[bootstrap.resample(1:nrow(cats)),]))
hist(resample.diffs); abline(v=diff.in.means(cats), col=2, lwd=3)

Dependence

Most interesting examples in probability have a little dependence added in: "If it rained yesterday, what is the probability it rains today?"

Use this to generate weather patterns and probabilities for some time in the future. Almost always occurs with time series; can occur with other dependence (spatial – if it's sunny in Dallas today, will it also be sunny in Fort Worth?)

Markov Dependence

Suppose we have a sequence of observations that are dependent. In a time series, what happens next depends on what happened before:

\[ p(X_1, X_2, ..., X_n) = p(X_1)p(X_2|X_1)...p(X_n|X_{n-1},...,X_1) \]

(Note: you could, of course, condition on the future to predict the past, if you had a time machine.)

Markov dependence: each outcome only depends on the one that came before.

\[ p(X_1, X_2, ..., X_n) = p(X_1)\prod_{i=2}^np(X_i|X_{i-1}) \]

Generating a Markov Chain

  1. Set up the conditional distribution.

  2. Draw the initial state of the chain.

  3. For every additional draw, use the previous draw to inform the new one.

And for loops are back in play!

Alluded to but not taught explicitly because we had better ways of doing things in R – until now.

Simple weather model: - if it was sunny yesterday, today's chance of sun is 80%. - if it wasn't sunny yesterday, today's chance of sun is 20%.

Simulate for one year (at the equator?)

sunny.year <- rep(NA, 365)
sunny.year[1] <- 1
for (day in 2:365) sunny.year[day] <- rbinom(1,1,0.2 + 0.6*sunny.year[day-1])

And for loops are back in play!

plot(sunny.year, main="Sunny Days in An Equatorial City", xlab="Day", ylab="Sunshine?", ty="l")

Different From Independence

boring.year <- rbinom(365, 1, 0.5)
plot(boring.year, main="Sunny Days in A Coin Flip City", xlab="Day", ylab="Sunshine?", ty="l")

The above chain

Transitions are represented as a matrix: \(Q_{ij}\) is \(P(X_t = j|X_{t-1} = i)\).

##      [,1] [,2]
## [1,]  0.8  0.2
## [2,]  0.2  0.8

Flippy chain

weird.year <- rep(NA, 365)
weird.year[1] <- 1
transition.matrix <- matrix (c(0.2, 0.8, 0.8, 0.2), nrow=2)
for (day in 2:365) weird.year[day] <- sample(1:2, 1, prob=transition.matrix[weird.year[day-1],])

Flippy chain

plot(weird.year, main="Sunny Days in Al Gore's Nightmare", xlab="Day", ylab="Sunshine?", ty="l")