[R] poisson regression with robust error variance ('eyestudy

Thu May 8 23:28:39 CEST 2008

Once again, Paul, many thanks for your thorough examination
of this question! And for spelling out your approach!!!

It certainly looks as though you're very close to target
(or even spot-on).

I've only one comment -- see at end.

On 08-May-08 20:35:38, Paul Johnson wrote:
> Ted Harding said:
>> I can get the estimated RRs from
> 
>> RRs <- exp(summary(GLM)$coef[,1])
> 
>> but do not see how to implement confidence intervals based
>> on "robust error variances" using the output in GLM.
> 
> Thanks for the link to the data. Here's my best guess. If you
> use the following approach, with the HC0 type of robust standard
> errors in the "sandwich" package (thanks to Achim Zeileis),
> you get "almost" the same numbers as that Stata output gives.
> The estimated b's from the glm match exactly, but the robust
> standard errors are a bit off.
> 
>### Paul Johnson 2008-05-08
>### sandwichGLM.R
> system("wget http://www.ats.ucla.edu/stat/stata/faq/eyestudy.dta")
> library(foreign)
> 
> dat <- read.dta("eyestudy.dta")
>### Ach, stata codes factor contrasts backwards
> 
> dat$carrot0 <- ifelse(dat$carrot==0,1,0)
> dat$gender1 <- ifelse(dat$gender==1,1,0)
> 
> glm1 <- glm(lenses~carrot0,  data=dat, family=poisson(link=log))
> summary(glm1)
> 
> library(sandwich)
> vcovHC(glm1)
> sqrt(diag(vcovHC(glm1)))
> sqrt(diag(vcovHC(glm1, type="HC0")))
> 
>### Result:
># > summary(glm1)
># Call:
># glm(formula = lenses ~ carrot0, family = poisson(link = log),
>#    data = dat)
> 
># Deviance Residuals:
>#     Min       1Q   Median       3Q      Max
># -1.1429  -0.9075   0.3979   0.3979   0.7734
> 
># Coefficients:
>#            Estimate Std. Error z value Pr(>|z|)
># (Intercept)  -0.8873     0.2182  -4.066 4.78e-05 ***
># carrot0       0.4612     0.2808   1.642    0.101
># ---
># Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
># (Dispersion parameter for poisson family taken to be 1)
> 
>#     Null deviance: 67.297  on 99  degrees of freedom
># Residual deviance: 64.536  on 98  degrees of freedom
># AIC: 174.54
> 
># Number of Fisher Scoring iterations: 5
> 
># > sqrt(diag(vcovHC(glm1, type="HC0")))
># (Intercept)     carrot0
>#  0.1673655   0.1971117
> 
>### Compare against
>## http://www.ats.ucla.edu/stat/stata/faq/relative_risk.htm
>## robust standard errors are:
>## .1682086  .1981048
> 
> glm2 <- glm(lenses~carrot0 +gender1 +latitude, data=dat,
> family=poisson(link=log))
> 
> sqrt(diag(vcovHC(glm2, type="HC0")))
>### Result:
># > summary(glm2)
> 
>#Call:
># glm(formula = lenses ~ carrot0 + gender1 + latitude, family =
> poisson(link = log),
>#     data = dat)
> 
># Deviance Residuals:
>#     Min       1Q   Median       3Q      Max
># -1.2332  -0.9316   0.2437   0.5470   0.9466
> 
># Coefficients:
>#            Estimate Std. Error z value Pr(>|z|)
># (Intercept) -0.65212    0.69820  -0.934   0.3503
># carrot0      0.48322    0.28310   1.707   0.0878 .
># gender1      0.20520    0.27807   0.738   0.4605
># latitude    -0.01001    0.01898  -0.527   0.5980
># ---
># Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
># (Dispersion parameter for poisson family taken to be 1)
> 
>#    Null deviance: 67.297  on 99  degrees of freedom
># Residual deviance: 63.762  on 96  degrees of freedom
># AIC: 177.76
> 
># Number of Fisher Scoring iterations: 5
> 
># > sqrt(diag(vcovHC(glm2, type="HC0")))
># (Intercept)     carrot0     gender1    latitude
># 0.49044963  0.19537743  0.18481067  0.01275001
> 
>### UCLA site has:
> 
>## .4929193   .1963616  .1857414  .0128142
> 
> 
> So, either there is some "rounding error" or Stata is not using
> exactly the same algorithm as vcovHC in sandwich.

The above differences look somewhat systematic (though very
small). The percentage differences (vcovHC relative to STATA)
for the two cases you analyse above are

vcovHC "HC0":  0.1673655   0.1971117
STATA       :  0.1682086   0.1981048
-------------------------------------
% Difference: -0.5012229  -0.5013003

vcovHC "HC0":  0.49044963  0.19537743  0.18481067  0.01275001
STATA:         0.4929193   0.1963616   0.1857414    .0128142
-------------------------------------------------------------
% Difference: -0.5010293  -0.5012029  -0.5010891  -0.5009287

so, in all cases, very close to -0.501%, despite varying absolute
values. This suggests a very subtle difference in algorithm,
rather than rounding error.
Though there is conceivably a convergence issue in the background.

Does ANYONE here know exactly how STATA does it?

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-May-08                                       Time: 22:28:36
------------------------------ XFMail ------------------------------