36-350
8 October 2014
We find errors in code all the time:
-0.5^0.5
[1] -0.7071
(-0.5)^0.5
[1] NaN
We find errors in code all the time:
mean(NA)
[1] NA
We find errors in code all the time:
The original name for glitches and unexpected defects: dates back to at least 1876 (Edison), but better story from Grace Hopper, PhD, 1946:
Debugging is (largely) Differential Diagnosis:
First step: Reproduce the Error
Second step: Bound the error
Third step: Get more information
message()
or print()
Worst worst case: the problem is a diffuse, all-pervading wrongness and you should curse the names of your professors and the patron saints of computing (dang it, Tukey!)
Typical case: the problem is a far more localized issue or set of issues that can be pinned down.
Tools:
traceback()
: where did an error message come from?message(), print()
: present intermediate outputs.warning()
: message from the code when it's finished runningstop(), stopifnot()
: terminate the run if something's wrong[[
… ]]
vs. [
… ]
==
vs. =
identical
, all.equal
)Recall the Gamma distribution estimator from a previous lab session. What if this was the code:
gamma.est <- function(input) {
meaninput <- mean(input)
varinput <- var(input)
scale <- varinput/meaninput
shape <- meaninput/scale
output <- list(shape=shape,scale=scale)
return(output)
}
First, make sure it works! Test this function on its own. A simple unit test:
input <- rgamma(10000, shape=0.1, scale=10)
gamma.est(input)
$shape
[1] 0.09881
$scale
[1] 10.24
Literally: traces back through all the function calls leading to the last error.
Start your attention at the first of these functions which you wrote (not that it can't be someone else at fault).
Often the most useful bit is somewhere in the middle (there may be many low-level functions called, like mean()
)
Now, the jackknife estimator for the parameter standard error:
#datavector = cats$Hwt[1:3]
gamma.jackknife <- function(datavector) {
datalength <- length(datavector)
jack.estimates <- sapply(1:length(datavector),
function (omitted.point) unlist(gamma.est(datavector[-omitted.point])))
var.of.ests <- apply(jack.estimates,1,var)
jack.var <- ((datalength-1)^2/datalength)*var.of.ests
return(sqrt(jack.var))
}
What happens?
library(MASS)
> gamma.jackknife(cats$Hwt[1:3])
Error: is.atomic(x) is not TRUE
> traceback()
5: stop(sprintf(ngettext(length(r), "%s is not TRUE", "%s are not all TRUE"),
ch), call. = FALSE, domain = NA)
4: stopifnot(is.atomic(x))
3: FUN(newX[, i], ...)
2: apply(jack.estimates, 1, var) at #4
1: gamma.jackknife(cats$Hwt[1:3])
print()
forces values to the screen; stick it before the problematic part to see if values look funny
print(paste("x is now",x))
y <- a.tricky.function(x)
print(paste("y has become",y"))
Add print(str(jack.estimates))
before the apply()
and run again:
gamma.jackknife.2 <- function(datavector) {
datalength <- length(datavector)
jack.estimates <- sapply(1:length(datavector),
function (omitted.point) gamma.est(datavector[-omitted.point]))
print(str(jack.estimates))
var.of.ests <- apply(jack.estimates,1,var)
jack.var <- ((datalength-1)^2/datalength)*var.of.ests
return(sqrt(jack.var))
}
> gamma.jackknife.2(cats$Hwt[1:3])
List of 6
$ : num 32.4
$ : num 0.261
$ : num 21.8
$ : num 0.379
$ : num 648
$ : num 0.0111
- attr(*, "dim")= int [1:2] 2 3
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "shape" "scale"
..$ : NULL
NULL
Show Traceback
Rerun with Debug
Error: is.atomic(x) is not TRUE
The problem is that gamma.est
gives a list, and so we get a weird list structure, instead of a plain array
Solution: re-write gamma.est
to give a vector (as in the lab instructions), or wrap unlist
around its output
gamma.est <- function(input) {
meaninput <- mean(input)
varinput <- var(input)
scale <- varinput/meaninput
shape <- meaninput/scale
return(c(shape=shape,scale=scale))
}
Print warning messages along with the call that initiated the weirdness
quadratic.solver <- function(a,b,c) {
determinant <- b^2 - 4*a*c
if (determinant < 0) {
warning("Equation has complex roots")
determinant <- as.complex(determinant)
}
return(c((-b+sqrt(determinant))/2*a, (-b-sqrt(determinant))/2*a))
}
quadratic.solver(1,0,-1)
[1] 1 -1
quadratic.solver(1,0,1)
[1] 0+1i 0-1i
Halt when results aren't as we expect, and say why. Recall: we've seen this one before
N.B., once you have found the bug, it's generally good to turn lots of these off!
Localize error by using inputs where you know the answer.
If you suspect myfunction()
is buggy, give it a simple case where the proper output is easy for you to calculate “by hand” (i.e., not using your implementation)
If myfunction()
works on a bunch of cases, well and good; if not, you need to fix it (and possibly other things).
If inputs come from other functions, write functions, with the right names, to generated fixed, simple values of the right format and content (save the real functions somewhere else)
To make sure the dummy is working, make its output as simple as you can
The browser
, recover
and debug
functions modify how R executes other functions
Let you view and modify the environment of the target function, and step through it
You do not need to master them for this class, though they can be very helpful
See chapter 13 of Matloff, and \( \S\S \) 3.5–3.6 of Chambers
After diagnosis, treatment: once the error is characterized and localized, guess at what's wrong with the code and how to fix it
Try the fix: does it work? Have you broken something else?
Try small cases first!
set.seed()
Help answer “How do I know I've fixed this bug?”
Help answer “How do I know I haven't broken something that was working?”
Much of what you did to characterize and localize the bug can be turned into tests
Ordinarily, errors just lead to crashing or the like
R has an error handling system which allows your function to catch, and recover from, errors in functions they call (functions: try
, tryCatch
)
Can also recover from not-really-errors (like optimizations that don't converge)
This system is very flexible, but rather complicated
Next time: more about testing