This data set is analyzed by Venables and Ripley also, pp. 230 ff.
The data is the result of an experiment on the toxicity of the tobacco budworm to doses of particular drug, that the moths are beginning to show resistance to. Batches of 20 moths of each sex were exposed for 3 days to the chemical and the number of dead (or disabled) moths in each group were recorded. The data is,
Dose Response Gender 1 1 M 2 4 M 4 9 M 8 13 M 16 18 M 32 20 M 1 0 F 2 2 F 4 6 F 8 10 F 16 12 F 32 16 F
The doses were in micro-grams, and the dose level when up by powers of
two. It is a judgement call as to whether you will try to model with
dose as the response or
. We will try both.
402 > bud <- read.table("budworm.dat", head=T)
402 > bud
Dose Response Gender
1 1 1 M
2 2 4 M
3 4 9 M
4 8 13 M
5 16 18 M
6 32 20 M
7 1 0 F
8 2 2 F
9 4 6 F
10 8 10 F
11 16 12 F
12 32 16 F
402 > attach(bud,pos=1)
402 > N <- 20
First, lets plot the data, and see if anything obvious shows up.
402 > plot( Dose, Response/N) 402 > points( Dose[Gender=="F"], (Response/N)[Gender=="F"], pch="F") 402 > points( Dose[Gender=="M"], (Response/N)[Gender=="M"], pch="M")
The same plot on the log(Dose) scale, is
402 > plot( Dose, Response/N, log="x") 402 > points( Dose[Gender=="F"], (Response/N)[Gender=="F"], pch="F") 402 > points( Dose[Gender=="M"], (Response/N)[Gender=="M"], pch="M")
Given that we have more than 0-1 data, it makes a lot of sense to at least try to plot the (observed) logits,
402 > par(mfrow=c(2,1))
402 > plot( Dose, logit(Response/N), log="x")
402 > title("Logit vs Log(Dose)")
402 > plot( Dose, logit(Response/N))
402 > title("Logit vs Dose")
This plot makes it quite clear, that the log scale has a much better chance of working.
So, proceed to fit the model. We surely want a model with different slopes for the two sexes
402 > ldose <- log(Dose)
402 > SF <- cbind(Response, N - Response)
402 > mymod <- glm(SF ~ Gender * ldose, family=binomial)
402 > summary(mymod)
..........
Coefficients:
Value Std. Error t value
(Intercept) -2.9935414 0.5525295 -5.4178852
Gender 0.1749865 0.7781556 0.2248733
ldose 1.3071341 0.2410133 5.4234939
Gender:ldose 0.5091459 0.3894474 1.3073547
Null Deviance: 124.8756 on 11 degrees of freedom
Residual Deviance: 4.993727 on 8 degrees of freedom
402 > anova(mymod,test="Chisq")
Analysis of Deviance Table
Binomial model
Response: SF
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(Chi)
NULL 11 124.8756
Gender 1 6.0770 10 118.7986 0.0136955
ldose 1 112.0415 9 6.7571 0.0000000
Gender:ldose 1 1.7633 8 4.9937 0.1842088
402 >
Now, what is going on here? Gender seems to have no effect
(based on the t-value), but the deviance is quite large? Even more,
our plots seem to imply what looks like a gender effect. The best way
to find out is to distrust the parameter stuff (it can get easily
confused), and try fitting a model without the interaction.
This makes sense anyway, because there is little/no evidence of an
interaction.
402 > anova(mymod <- glm(SF ~ Gender + ldose, family=binomial))
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 11 124.8756
Gender 1 6.0770 10 118.7986
ldose 1 112.0415 9 6.7571
402 > summary(mymod)
Coefficients:
Value Std. Error t value
(Intercept) -3.473154 0.4682939 -7.416612
Gender 1.100743 0.3557226 3.094385
ldose 1.535336 0.1890119 8.122959
Null Deviance: 124.8756 on 11 degrees of freedom
Residual Deviance: 6.757064 on 9 degrees of freedom
That is a little more like it, and seems to be a VERY good fitting model. We should at least look at some diagnostics, and some other plots, but unless they show something very odd, we are done.
402 > plot(mymod)
That looks very good, we should finish by plotting the data and fitted curves on the same graph, and that is left as an exercise for you (or look it up in the book). The book also goes on to talk about LD50, and how to estimate it.