Topics we got to earlier in the course:

- How to make draws from, and use random distributions
- Writing functions with known run-times and verifiable results
- Why not to use
`for`

loops

Topics we got to earlier in the course:

- How to make draws from, and use random distributions
- Writing functions with known run-times and verifiable results
- Why not to use
`for`

loops

- Writing simulations using random number generation
- From the jackknife to the bootstrap
- Processes that have memory
- Finally,
`for`

loops!

`[dpqr]unif`

, etc.Recall how we drew from various distributions: `runif()`

: draws from the uniform distribution (others follow)

Home-baked discrete distributions: Use `sample()`

population.values <- 1:3 probs <- c(5,2,3) my.draw <- sample (population.values, 100000, probs, replace=TRUE) table(my.draw)

## my.draw ## 1 2 3 ## 50227 20027 29746

`sample()`

`sample()`

is powerful – it works on any object that has a defined `length()`

. Permutations:

sample(5)

## [1] 4 5 1 3 2

sample(1:6)

## [1] 1 2 4 6 5 3

replicate(3,sample(c("Curly","Larry","Moe","Shemp")))

## [,1] [,2] [,3] ## [1,] "Larry" "Curly" "Larry" ## [2,] "Curly" "Moe" "Curly" ## [3,] "Shemp" "Larry" "Shemp" ## [4,] "Moe" "Shemp" "Moe"

sample(list("A",3,sum))

## [[1]] ## [1] "A" ## ## [[2]] ## [1] 3 ## ## [[3]] ## function (..., na.rm = FALSE) .Primitive("sum")

`sample()`

Resampling from any existing distribution gets us the **bootstrap** family of estimators:

bootstrap.resample <- function (object) sample (object, length(object), replace=TRUE) replicate(5, bootstrap.resample (6:10))

## [,1] [,2] [,3] [,4] [,5] ## [1,] 8 8 6 6 6 ## [2,] 9 6 10 7 7 ## [3,] 7 7 10 10 8 ## [4,] 9 10 6 6 7 ## [5,] 8 7 9 10 6

Recall: the *jackknife* worked by removing one point from the sample and recalculating the statistic of interest. Here we resample the same length with replacement.

The 2-sample `t`

-test checks for differences in means according to a known null distribution. Let's resample and generate the sampling distribution under the bootstrap assumption:

library(MASS) diff.in.means <- function(df) { mean(df[df$Sex=="M","Hwt"]) - mean(df[df$Sex=="F","Hwt"]) } resample.diffs <- replicate(1000, diff.in.means(cats[bootstrap.resample(1:nrow(cats)),])) hist(resample.diffs); abline(v=diff.in.means(cats), col=2, lwd=3)

Most interesting examples in probability have a little dependence added in: "If it rained yesterday, what is the probability it rains today?"

Use this to generate weather patterns and probabilities for some time in the future. Almost always occurs with time series; can occur with other dependence (spatial – if it's sunny in Dallas today, will it also be sunny in Fort Worth?)

Suppose we have a sequence of observations that are dependent. In a time series, what happens next depends on what happened before:

\[ p(X_1, X_2, ..., X_n) = p(X_1)p(X_2|X_1)...p(X_n|X_{n-1},...,X_1) \]

(Note: you could, of course, condition on the future to predict the past, if you had a time machine.)

Markov dependence: each outcome only depends on the one that came before.

\[ p(X_1, X_2, ..., X_n) = p(X_1)\prod_{i=2}^np(X_i|X_{i-1}) \]

Set up the conditional distribution.

Draw the initial state of the chain.

For every additional draw, use the previous draw to inform the new one.

`for`

loops are back in play!Alluded to but not taught explicitly because we had better ways of doing things in R – until now.

Simple weather model: - if it was sunny yesterday, today's chance of sun is 80%. - if it wasn't sunny yesterday, today's chance of sun is 20%.

Simulate for one year (at the equator?)

sunny.year <- rep(NA, 365) sunny.year[1] <- 1 for (day in 2:365) sunny.year[day] <- rbinom(1,1,0.2 + 0.6*sunny.year[day-1])

`for`

loops are back in play!plot(sunny.year, main="Sunny Days in An Equatorial City", xlab="Day", ylab="Sunshine?", ty="l")

boring.year <- rbinom(365, 1, 0.5) plot(boring.year, main="Sunny Days in A Coin Flip City", xlab="Day", ylab="Sunshine?", ty="l")

Transitions are represented as a matrix: \(Q_{ij}\) is \(P(X_t = j|X_{t-1} = i)\).

## [,1] [,2] ## [1,] 0.8 0.2 ## [2,] 0.2 0.8

weird.year <- rep(NA, 365) weird.year[1] <- 1 transition.matrix <- matrix (c(0.2, 0.8, 0.8, 0.2), nrow=2) for (day in 2:365) weird.year[day] <- sample(1:2, 1, prob=transition.matrix[weird.year[day-1],])

plot(weird.year, main="Sunny Days in Al Gore's Nightmare", xlab="Day", ylab="Sunshine?", ty="l")