# Lecture 14: Testing Strategies for Code Debugging

36-350
13 October 2014

### Last Time: Basic Debugging

Basic tricks for debugging:

• Localizing issues and changing input parameters
• Precomputed results

### Today: Intermediate Debugging

Better success through design!

• Trusting our results through modular design
• Building tests: functional tests (top-level), unit tests (bottom-level)

### Procedure versus Substance

Our two competing goals:

• Do we get the right answer (substance)?
• Do we get an answer in the right way (procedure)?

An important distinction, as these these go back and forth with each other:

• We trust a procedure because it gives the right answer.
• We trust the answer because it came from a good procedure.

Since programming means making a procedure, we check the substance primarily.

### Testing for particular cases

add <- function (part1, part2) { part1 + part2 }
a <- runif(1)

 TRUE

add(a,0) == a

 TRUE

add(a,-a) == 0

 TRUE


### Testing for particular cases

Real numbers and floating-point precision

cor(c(1,-1,1,1),c(-1,1,-1,1))

 -0.5774

-1/sqrt(3)

 -0.5774

cor(c(1,-1,1,1),c(-1,1,-1,1)) == -1/sqrt(3)

 FALSE


### Testing by cross-checking

Compare alternate routes to the same answer:

test.unif <- runif(n=3,min=-10,max=10)

 TRUE

add(add(test.unif,test.unif),test.unif) ==

 TRUE

add(test.unif*test.unif,test.unif*test.unif) ==

 FALSE


### Testing by cross-checking

Test function: numerical derivative

x <- runif(10,-10,10)
f <- function(x) {x^2*exp(-x^2)}
g <- function(x) {2*x*exp(-x^2) -2* x^3*exp(-x^2)}
isTRUE(all.equal(derivative(f,x), g(x)))


### Testing by cross-checking

If this seems too unstatistical…

xx <- runif(10)
aa <- runif(1)
cor(xx,xx) == 1

 TRUE

cor(xx,-xx) == -1

 TRUE

cor(xx,aa*xx) == 1

 FALSE


### Testing by cross-checking

pp <- runif(10); mean=0; sd=xx
all(pnorm(0,mean=mean,sd=sd) == 0.5)

 TRUE

pnorm(xx,mean,sd) == pnorm((xx-mean)/sd,0,1)

  TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE


### Testing by cross-checking

all(pnorm(xx,0,1) == 1-pnorm(-xx,0,1))

 TRUE

pnorm(qnorm(pp)) == pp

   TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

qnorm(pnorm(xx)) == xx

  FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE


With finite precision we don't really want to insist that these be exact!

### Software Testings vs. Hypothesis Testing

Statistical hypothesis testing: risk of false alarm (size) vs. probability of detection (power) – this balances type I vs. type II errors

In software testing: no false alarms allowed (false alarm rate $$=0$$). This must reduce our power to detect errors; code can pass all our tests and still be wrong.

But! we can direct the power to detect certain errors, including where the error lies, if we test small pieces.

### Combining Testing and Coding

The idea behind unit testing:

• A variety of tests gives us more power to detect errors, more confidence when tests are passed.
• By breaking code into self-enclosed functions, we can better identify problems.
• Therefore: for each function, we build a battery of tests that are easy to step through and identify problems.
• This makes it easier to add new tests to a function as well.
• By bundling these tests into their own function, we keep program flow clean and remind ourselves later why this mattered!

### The Great Testing Cycle

After making changes to a function, re-run its tests, and those of functions that depend on it.

• If anything's (still) broken, fix it; if not, continue.
• When you meet a new error, write a new test.
• When you add a new capacity, write a new test.

### A Ratchet Approach: "Regression Testing"

When we have a version of the code which we are confident gets some cases right, keep it around (under a separate name).

• Now compare new versions to the old, on those cases

• Keep debugging until the new version is at least as good as the old

### Test-Driven Development

General strategy for development.

1. Have an idea about what the program should do.

• Idea is vague and unhelpful
• Make it clear and useful by writing tests for success
• Tests come first, then the program
2. Modify code until it passes all the tests

3. When you find a new error, write a new test

4. When you add a new capacity, write a new test

5. When you change your mind about the goal, change the tests

6. By the end, the tests specify what the program should do, and the program does it

### Awkward Cases

Boundary cases, “at the edge” of something, or non-standard inputs. Such as:

add(5,NA)    # NA, presumably

 NA

try(add("a","b")) # NA, or error message?
divide <- function (top, bottom) top/bottom
divide(10,0) # Inf, presumably

 Inf

divide(0,0)  # NA?

 NaN


### Awkward Cases

Pinning down awkward cases helps specify function

var(1)       # NA? error?

 NA

cor(c(1,-1,1,-1),c(-1,1,NA,1))   # NA? -1? -1 with a warning?

 NA

try(cor(c(1,-1,1,-1),c(-1,1,"z",1)))  # NA? -1? -1 with a warning?
try(cor(c(1,-1),c(-1,1,-1,1)))        # NA? 0? -1?


### Pitfalls

• Writing tests takes time
• Running tests takes time
• Tests have to be debugged themselves
• Tests can provide a false sense of security
• There are costs to knowing about problems (people get upset, responsibility to fix things, etc.)

### Summary

• Trusting software means testing it for correctness, both of substance and of procedure
• Software testing is an extreme form of hypothesis testing: no false positives allowed, so any power to detect errors has to be very focused
• $$\therefore$$ Write and use lots of tests; add to them as we find new errors
• Cycle between writing code and testing it