Topics we got to earlier in the course:
- How to make draws from, and use random distributions
- Writing functions with known run-times and verifiable results
- Why not to use
for
loops
Topics we got to earlier in the course:
for
loopsfor
loops![dpqr]unif
, etc.Recall how we drew from various distributions: runif()
: draws from the uniform distribution (others follow)
Home-baked discrete distributions: Use sample()
population.values <- 1:3 probs <- c(5,2,3) my.draw <- sample (population.values, 100000, probs, replace=TRUE) table(my.draw)
## my.draw ## 1 2 3 ## 50227 20027 29746
sample()
sample()
is powerful – it works on any object that has a defined length()
. Permutations:
sample(5)
## [1] 4 5 1 3 2
sample(1:6)
## [1] 1 2 4 6 5 3
replicate(3,sample(c("Curly","Larry","Moe","Shemp")))
## [,1] [,2] [,3] ## [1,] "Larry" "Curly" "Larry" ## [2,] "Curly" "Moe" "Curly" ## [3,] "Shemp" "Larry" "Shemp" ## [4,] "Moe" "Shemp" "Moe"
sample(list("A",3,sum))
## [[1]] ## [1] "A" ## ## [[2]] ## [1] 3 ## ## [[3]] ## function (..., na.rm = FALSE) .Primitive("sum")
sample()
Resampling from any existing distribution gets us the bootstrap family of estimators:
bootstrap.resample <- function (object) sample (object, length(object), replace=TRUE) replicate(5, bootstrap.resample (6:10))
## [,1] [,2] [,3] [,4] [,5] ## [1,] 8 8 6 6 6 ## [2,] 9 6 10 7 7 ## [3,] 7 7 10 10 8 ## [4,] 9 10 6 6 7 ## [5,] 8 7 9 10 6
Recall: the jackknife worked by removing one point from the sample and recalculating the statistic of interest. Here we resample the same length with replacement.
The 2-sample t
-test checks for differences in means according to a known null distribution. Let's resample and generate the sampling distribution under the bootstrap assumption:
library(MASS) diff.in.means <- function(df) { mean(df[df$Sex=="M","Hwt"]) - mean(df[df$Sex=="F","Hwt"]) } resample.diffs <- replicate(1000, diff.in.means(cats[bootstrap.resample(1:nrow(cats)),])) hist(resample.diffs); abline(v=diff.in.means(cats), col=2, lwd=3)
Most interesting examples in probability have a little dependence added in: "If it rained yesterday, what is the probability it rains today?"
Use this to generate weather patterns and probabilities for some time in the future. Almost always occurs with time series; can occur with other dependence (spatial – if it's sunny in Dallas today, will it also be sunny in Fort Worth?)
Suppose we have a sequence of observations that are dependent. In a time series, what happens next depends on what happened before:
\[ p(X_1, X_2, ..., X_n) = p(X_1)p(X_2|X_1)...p(X_n|X_{n-1},...,X_1) \]
(Note: you could, of course, condition on the future to predict the past, if you had a time machine.)
Markov dependence: each outcome only depends on the one that came before.
\[ p(X_1, X_2, ..., X_n) = p(X_1)\prod_{i=2}^np(X_i|X_{i-1}) \]
Set up the conditional distribution.
Draw the initial state of the chain.
For every additional draw, use the previous draw to inform the new one.
for
loops are back in play!Alluded to but not taught explicitly because we had better ways of doing things in R – until now.
Simple weather model: - if it was sunny yesterday, today's chance of sun is 80%. - if it wasn't sunny yesterday, today's chance of sun is 20%.
Simulate for one year (at the equator?)
sunny.year <- rep(NA, 365) sunny.year[1] <- 1 for (day in 2:365) sunny.year[day] <- rbinom(1,1,0.2 + 0.6*sunny.year[day-1])
for
loops are back in play!plot(sunny.year, main="Sunny Days in An Equatorial City", xlab="Day", ylab="Sunshine?", ty="l")
boring.year <- rbinom(365, 1, 0.5) plot(boring.year, main="Sunny Days in A Coin Flip City", xlab="Day", ylab="Sunshine?", ty="l")
Transitions are represented as a matrix: \(Q_{ij}\) is \(P(X_t = j|X_{t-1} = i)\).
## [,1] [,2] ## [1,] 0.8 0.2 ## [2,] 0.2 0.8
weird.year <- rep(NA, 365) weird.year[1] <- 1 transition.matrix <- matrix (c(0.2, 0.8, 0.8, 0.2), nrow=2) for (day in 2:365) weird.year[day] <- sample(1:2, 1, prob=transition.matrix[weird.year[day-1],])
plot(weird.year, main="Sunny Days in Al Gore's Nightmare", xlab="Day", ylab="Sunshine?", ty="l")