Statistical Computing, 36-350
Wednesday October 21, 2015
Basic debugging tricks:
Better success through design!
Our two goals:
Both are important (don’t forget about the second!)
Statistical hypothesis testing: risk of false alarm (size) and probability of detection (power). This balances type I and type II errors
Software testing: no false alarms allowed. This is going to reduce our power to detect errors; code can pass all our tests and still be wrong
But! we can direct the power to detect certain errors, including where the error lies, if we test small pieces
When we have a version of the code which we are confident gets some cases right, we should keep it around (under a separate name)
Step 1: thnk about the big picture
Step 2: divide-and-conquer mentality
Step 3: assuming you’ve written each part, how would you put them together?
# Not actual code
big.job = function(lots.of.arguments) {
  intermediate.result = first.step(some.of.the.args)
  final.result = second.step(intermediate.result,rest.of.the.args)
  return(final.result)
}Step 4+: you don’t actually have a working program yet, but you have a good setup
Even if we didn’t start in top-down design mode, once we have some code, and it (more or less) works, re-write it to emphasize commonalities:
Re-factoring tends to make code look more like the result of top-down design. This is no accident!
mean.jackknife = function(vec) {
  n = length(vec)
  jackknife.ests = vector(length=n)
  for (i in 1:n) {
    jackknife.ests[i] = mean(vec[-i])
  }
  variance.of.ests = var(jackknife.ests)
  jackknife.var = ((n-1)^2/n)*variance.of.ests
  jackknife.stderr = sqrt(jackknife.var)
  return(jackknife.stderr)
}some.normals = rnorm(100,mean=7,sd=5)
mean(some.normals)## [1] 6.237172se.formula = sd(some.normals)/sqrt(length(some.normals))
se.jackknife = mean.jackknife(some.normals)
max(abs(se.formula-se.jackknife))## [1] 3.330669e-16Recall our friend the method of moments estimator:
gamma.est = function(the.data) {
  m = mean(the.data)
  v = var(the.data)
  a = m^2/v
  s = v/m
  return(c(a=a,s=s))
}gamma.jackknife = function(vec) {
  n = length(vec)
  jackknife.ests = matrix(NA,nrow=2,ncol=n)
  rownames(jackknife.ests) = c("a","s")
  for (i in 1:n) {
    fit = gamma.est(vec[-i])
    jackknife.ests["a",i] = fit["a"]
    jackknife.ests["s",i] = fit["s"]
  }
  variance.of.ests = apply(jackknife.ests,1,var)
  jackknife.vars = ((n-1)^2/n)*variance.of.ests
  jackknife.stderrs = sqrt(jackknife.vars)
  return(jackknife.stderrs)
}data("cats",package="MASS")
gamma.est(cats$Hwt)##          a          s 
## 19.0653121  0.5575862gamma.jackknife(cats$Hwt)##          a          s 
## 2.74062279 0.07829436jackknife.lm = function(df,formula,p) {
  n = nrow(df)
  jackknife.ests = matrix(0,nrow=p,ncol=n)
  for (i in 1:n) {
    new.coefs = lm(as.formula(formula),data=df[-i,])$coefficients
    jackknife.ests[,i] = new.coefs
  }
  variance.of.ests = apply(jackknife.ests,1,var)
  jackknife.var = ((n-1)^2/n)*variance.of.ests
  jackknife.stderr = sqrt(jackknife.var)
  return(jackknife.stderr)
}cats.lm = lm(Hwt~Bwt,data=cats)
coefficients(cats.lm)## (Intercept)         Bwt 
##  -0.3566624   4.0340627# "Official" standard errors
sqrt(diag(vcov(cats.lm)))## (Intercept)         Bwt 
##   0.6922770   0.2502615jackknife.lm(df=cats,formula="Hwt~Bwt",p=2)## [1] 0.8314142 0.3166847figure out the size of the data
for each case
   repeat some estimation, after omittingthat case
   collect all estimates in a vector as you go
take variances across cases
scale up variances
take the square rootsLet’s define a function for the common operation of omitting one case
omit.case = function(dat,i) {
  dims = dim(dat)
  if (is.null(dims) || (length(dims)==1)) {
    return(dat[-i])
  } else {
    return(dat[-i,])
  }
}(Exercise: modify so it also handles higher-dimensional arrays)
Let’s write a function for the general jackknife workflow
jackknife = function(estimator,dat) {
  n = ifelse(is.null(dim(dat)),length(dat),nrow(dat))
  omit.and.est = function(i) { estimator(omit.case(dat,i)) }
  jackknife.ests = matrix(sapply(1:n, omit.and.est), ncol=n)
  variance.of.ests = apply(jackknife.ests,1,var)
  jackknife.var = ((n-1)^2/n)*variance.of.ests
  jackknife.stderr = sqrt(jackknife.var)
  return(jackknife.stderr)
}Could allow other arguments to estimator, spin off finding n as its own function, etc.
jackknife(estimator=mean,dat=some.normals)## [1] 0.5764657max(abs(jackknife(estimator=mean,dat=some.normals) -
          mean.jackknife(some.normals)))## [1] 0max(abs(jackknife(estimator=gamma.est,dat=cats$Hwt) -
          gamma.jackknife(cats$Hwt)))## [1] 0est.coefs = function(dat) {
  return(lm(Hwt~Bwt,data=dat)$coefficients)
}
est.coefs(cats)## (Intercept)         Bwt 
##  -0.3566624   4.0340627max(abs(est.coefs(cats) - coefficients(cats.lm)))## [1] 0jackknife(estimator=est.coefs,dat=cats)## [1] 0.8314142 0.3166847max(abs((jackknife(estimator=est.coefs,dat=cats) -
          jackknife.lm(df=cats,formula="Hwt~Bwt",p=2))))## [1] 0Note: we’ve been just cross-checking our code the whole time against our old code, to make sure it works