Why simulate?

R gives us unique access to great simulation tools (unique compared to other languages). Why simulate? Welcome to the 21st century! Two reasons:

Often, simulations can be easier than hand calculations
Often, simulations can be made more realistic than hand calculations

Random number generation

Already, we’ve simulated random numbers in R according to various distributions. For the normal distribution, we have the utility functions:

rnorm(): generate normal random variables
pnorm(): normal distribution function, \(\Phi(x)=P(Z \leq x)\)
dnorm(): normal density function, \(\phi(x)= \Phi'(x)\)
qnorm(): normal quantile function, \(q(y)=\Phi^{-1}(y)\), i.e., \(\Phi(q(y))=y\)

Replace “norm” with the name of another distribution, all the same functions apply. E.g., “t”, “exp”, “gamma”, “chisq”, “binom”, “pois”, etc.

Random number examples

Standard normal random variables (mean 0 and variance 1)

n = 1000
z = rnorm(n, mean=0, sd=1) # These are the defaults for mean, sd
mean(z)  # Check: sample mean is approximately 0

## [1] -0.03675064

var(z)   # Check: sample variance is approximately 1

## [1] 1.066317

(Continued)

Normal distribution and density functions

x = seq(-3,3,length=100)
plot(ecdf(z), ylab="Distribution", main="Empirical distribution",
     lwd=2, col="red")
lines(x, pnorm(x), lwd=2)
legend("topleft", legend=c("Empirical distribution", "Actual distribution"),
       lwd=2, col=c("red","black"))

hist(z, breaks=30, main="Histogram", col="pink", 
     prob=TRUE)
lines(x, dnorm(x), lwd=2)
legend("topleft", legend=c("Histogram", "Actual density"),
       lwd=2, col=c("pink","black"))

(Interesting statistical fact: in general—not just for the normal distribution—the empirical distribution function is pretty much always quite close to the actual distribution function. This is not true on the density scale, i.e., the histogram typically converges much more slowly)

Same function call, different results

Not surprisingly, we get different draws each time we call rnorm()

mean(rnorm(n))

## [1] -0.00452

mean(rnorm(n))

## [1] 8.546379e-05

mean(rnorm(n))

## [1] 0.001750174

mean(rnorm(n))

## [1] 0.003738059

Is it really random?

The number generated in R (in any language) are not “truly” random; they are what is called pseudorandom

These are numbers generated by computer algorithms that very closely mimick “truly” random numbers
The study of such algorithms is an interesting research area in its own right!
The default algorithm in R (and in nearly all software languages) is called the “Mersenne Twister”
Type ?Random in your R console to read more about this (and to read how to change the algorithm used for pseudorandom number generation, which you should never really have to do, by the way)

Setting the random seed

All pseudorandom number generators depend on what is called a seed value

This puts the random number generator in a well-defined “state”, so that the numbers it generates, from then on, will be reproducible
The seed is just an integer, and can be set with set.seed()
The reason we set it: so that when someone else runs our simulation code, they can see the same—albeit, still random—results that we do

Seed examples

# Getting the same 5 random normals over and over
set.seed(0); rnorm(5)

## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414

set.seed(0); rnorm(5)

## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414

set.seed(0); rnorm(5)

## [1]  1.2629543 -0.3262334  1.3297993  1.2724293  0.4146414

(Continued)

# Different seeds, different numbers
set.seed(1); rnorm(5)

## [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078

set.seed(2); rnorm(5)

## [1] -0.89691455  0.18484918  1.58784533 -1.13037567 -0.08025176

set.seed(3); rnorm(5)

## [1] -0.9619334 -0.2925257  0.2587882 -1.1521319  0.1957828

(Continued)

# Each time the seed is set, the same sequence follows (indefinitely)
set.seed(0); rnorm(3); rnorm(2); rnorm(1)

## [1]  1.2629543 -0.3262334  1.3297993

## [1] 1.2724293 0.4146414

## [1] -1.53995

set.seed(0); rnorm(3); rnorm(2); rnorm(1)

## [1]  1.2629543 -0.3262334  1.3297993

## [1] 1.2724293 0.4146414

## [1] -1.53995

set.seed(0); rnorm(3); rnorm(2); rnorm(1)

## [1]  1.2629543 -0.3262334  1.3297993

## [1] 1.2724293 0.4146414

## [1] -1.53995

Simulation Basics

Statistical Computing, 36-350

Wednesday October 5, 2016

Why simulate?

Random number generation

Random number examples

(Continued)

Same function call, different results

Is it really random?

Setting the random seed

Seed examples

(Continued)

(Continued)