[R] Confidence intervals for relative risk

Mon Nov 13 16:54:04 CET 2006

Wolfgang,
  It is common to handle relative risk problems using Poisson regression.
  In your example you have 8 events out of 508 tries, and 0/500 in the second
data set.

> tdata <- data.frame(y=c(8,0), n=c(508,500), group=1:0)
> fit <- glm(y ~ group + offset(log(n)), data=tdata, family=poisson)

  Because of the zero, the standard beta/se(beta) confidence intervals don't
work.  In fact, the actual MLE for the ratio is infinity, which the above
glm decides is exp(33.85) = 5* 10^14 --- which is close enough to infinity
for me.  What you want is the lower limit of the interval, which you can
find with a profile likelihood.  That is, look at the curve of deviance vs
beta, draw a horizontal line 3.84 units down from the top (chisq on 1 df),
and where it itersects the curve is your confidence limit.

  For the above, since it is 2 groups with 2 coefficients, the deviance of
the full model happens to be 0.  Your approximation gave a lower limit of
.97 which is about exp(0), so I'll guess a solution between -2 and 2.

> xx <- seq(-2, 2, length=21)
> for (i in 1:21) {
	fit <- glm(y ~ offset(group*xx[i] + log(n)), poisson, tdata)
	print(c(xx[i], fit$deviance))
	}

[1]  -2.00000  33.80736
[1]  -1.80000  31.02994
[1]  -1.60000  28.33138
[1]  -1.40000  25.72327
[1]  -1.20000  23.21769
[1]  -1.00000  20.82691
[1]  -0.80000  18.56281
[1]  -0.60000  16.43629
[1]  -0.40000  14.45668
[1]  -0.20000  12.63108
[1]  0.00000 10.96387
[1] 0.20000 9.45639
[1] 0.400000 8.106805
[1] 0.600000 6.910274
[1] 0.800000 5.859303
[1] 1.000000 4.944278
[1] 1.200000 4.154088
[1] 1.400000 3.476757
[1] 1.60000 2.90003
[1] 1.80000 2.41186
[1] 2.000000 2.000785

 So the "exact" lower limit is somewhere between exp(1.4) and exp(1.2), which
is around 3.6.  One can easily refine this by tucking the fit into an
iterative search, or just resetting xx to a new range.

  The offset(log(n)) part of the model, BTW, is a well known trick in poisson
models.  It has to do with the fact that the likelihood is in terms of the
number of events, but we want coefficients in terms of rates, and E(y) = rate*n.
Adding an offset of xx[i] * group fits a model with the group effect fixed at
xx, but allowing the intercept to vary.  
  For more conceptual depth, you could look up 'rate regression' or 
'standardized mortality ratio' in the Encyclopedia of Biostatistics --- it is
in the latter computation that this approach is quite common.  The idea of
adding a fraction is found in Anscombe(1949), Transformations of Poisson, 
binomial and negative-binomial data.  Biometrika, vol 35, p246-254, but for
each single estimate not for the ratio.

	Terry Therneau
	Mayo Clinic