29 September 2014

In Previous Episodes

  • We made reference to random number generation without going under the hood.

Today

  • How does R get "random" numbers, anyway?
  • It doesn't, really – it uses a trick that should be indistinguishable from the real McCoy

How Do We Get "Real" Randomness?

Introduce real-world noise:

  • Thermal detection – trailing decimal points on a thermometer
  • From Space! – cosmic ray/radioactive decay arrival timings, or any homogeneous Poisson process
  • From Earth! – http://www.random.org/

These Cost Money and I'm Cheap

Pseudorandom generators produce a deterministic sequence that is indistiguishable from a true random sequence if you don't know how it started.

Example: runif, where we know where it started

runif(1:10)
##  [1] 0.20972 0.65726 0.70948 0.70757 0.56600 0.54488 0.70300 0.09676
##  [9] 0.63422 0.21623
set.seed(10)
runif(1:10)
##  [1] 0.50748 0.30677 0.42691 0.69310 0.08514 0.22544 0.27453 0.27231
##  [9] 0.61583 0.42967
set.seed(10)
runif(1:10)
##  [1] 0.50748 0.30677 0.42691 0.69310 0.08514 0.22544 0.27453 0.27231
##  [9] 0.61583 0.42967

Basic version: Linear Congruential Generator

seed <- 10
new.random <- function (a=5, c=12, m=16) {
  out <- (a*seed + c) %% m  
  seed <<- out
  return(out)
}
out.length <- 20
variates <- rep (NA, out.length)
for (kk in 1:out.length) variates[kk] <- new.random()
variates
##  [1] 14  2  6 10 14  2  6 10 14  2  6 10 14  2  6 10 14  2  6 10

Try again

Period 8:

variates <- rep (NA, out.length)
for (kk in 1:out.length) variates[kk] <- new.random(a=131, c=7, m=16)
variates
##  [1]  5  6  9  2 13 14  1 10  5  6  9  2 13 14  1 10  5  6  9  2

Try again, again

Period 16:

variates <- rep (NA, out.length)
for (kk in 1:out.length) variates[kk] <- new.random(a=129, c=7, m=16)
variates
##  [1]  9  0  7 14  5 12  3 10  1  8 15  6 13  4 11  2  9  0  7 14

Try again, at last

Numerical Recipes uses

variates <- rep (NA, out.length)
for (kk in 1:out.length) variates[kk] <- new.random(a=1664545, c=1013904223, m=2^32)
variates
##  [1] 1.037e+09 2.091e+09 4.106e+09 7.684e+08 3.836e+09 1.329e+09 2.125e+09
##  [8] 2.669e+09 3.582e+09 2.079e+09 2.067e+09 2.197e+09 3.749e+09 2.914e+09
## [15] 7.588e+08 4.029e+09 2.837e+09 1.458e+09 2.399e+09 2.767e+09

How To Distinguish Non-Randomness

  • We've covered period: if it's missing some values, it's distinguishable
  • Uniformity of distribution in the limit
  • Autocorrelation
  • Dimensional distribution – not a problem for 1-D distributions, but can be for 2+-D

How does R get everything we need?

A few distributions of interest:

  • Uniform(0,1)
  • Bernoulli(p)
  • Binomial(n,p)
  • Gaussian(0,1)
  • Exponential(1)
  • Gamma(a)

In R: everything we need

Suppose we were working with the Exponential distribution.

  • rexp() generates variates from the distribution.
  • dexp() gives the probability density function.
  • pexp() gives the cumulative distribution function.
  • qexp() gives the quantiles.

dexp()

dexp(0:5)
## [1] 1.000000 0.367879 0.135335 0.049787 0.018316 0.006738
this.range <- 0:50/5
plot (this.range, dexp(this.range), ty="l")
lines (this.range, dexp(this.range, rate=0.5), col="red")
lines (this.range, dexp(this.range, rate=0.2), col="blue")

plot of chunk unnamed-chunk-6

pexp()

pexp(0:5)
## [1] 0.0000 0.6321 0.8647 0.9502 0.9817 0.9933
this.range <- 0:50/5
plot (this.range, pexp(this.range), ty="l")
lines (this.range, pexp(this.range, rate=0.5), col="red")
lines (this.range, pexp(this.range, rate=0.2), col="blue")

plot of chunk unnamed-chunk-7

qexp()

qexp(0:5)
## Warning: NaNs produced
## [1]   0 Inf NaN NaN NaN NaN
this.range <- seq(0,1,by=0.1)
qexp(this.range)
##  [1] 0.0000 0.1054 0.2231 0.3567 0.5108 0.6931 0.9163 1.2040 1.6094 2.3026
## [11]    Inf
plot (this.range, qexp(this.range), ty="l")
lines (this.range, qexp(this.range, rate=0.5), col="red")
lines (this.range, qexp(this.range, rate=0.2), col="blue")

plot of chunk unnamed-chunk-8