[R] Discrepant lm() and survreg() standard errors with weighted fits

Wed Feb 26 20:49:17 CET 2014

I understand that the robust variances may lead to a different
standard error.  I want the standard error valid for heteroscedastic
data, ultimately, because I have very good estimates of the
measurement variances (why I'm doing weighted fits in the first
place).

For the simple example here, the White estimate is somewhere between
the standard error for lm() and survreg():

> library(sandwich)
> test <- data.frame(x=1:6, y=c(1,3,2,4,6,5))
> vcovHC(lm(y~x, data=test), type=c('HC'))
            (Intercept)           x
(Intercept)  0.34636190 -0.08372245
x           -0.08372245  0.02811895
> vcovHC(lm(y~x, data=test), type=c('HC'))[1,1]^0.5
[1] 0.5885252
> vcovHC(lm(y~x, data=test), type=c('HC'))[2,2]^0.5
[1] 0.1676871

(The HC3 method gives SEs which are consistent with those from lm().)

I don't understand how survreg() can do better than White?

Thanks again for your help,

Kyle

On Wed, Feb 26, 2014 at 7:21 AM, Therneau, Terry M., Ph.D.
<therneau at mayo.edu> wrote:
>  The robust variances are a completely different estimate of standard error.
> For linear models the robust variance has been rediscovered many times and
> so has lots of names: the White estimate in economics, the Horvitz-Thompson
> in surveys, working independence esitmate in GEE models, infinitesimal
> jackknife in stat literature, ....  But it is not the MLE estimate.
>   When the robust estimate was added to survreg (and coxph) I made the
> decision that IF someone was invoking the robust variance, AND they were
> using weights, that simple case weights were unlikely to be what they had.
> So I chose to make treat the weights as sampling or precision weights, in
> contraindication to the longer standing behavior of coxph/survreg without a
> robust argument.  Looking back, I probably should have taken one step
> further and changed the routines' behavior globally on the presumption that
> true case weights are vanishingly rare.  They were not uncommon in my
> computing youth, when computer memory of < 64KB was usual (max possible on
> the PDP-11 architecture).  But one is always cautious about non-backwards
> compatable changes.
>
> ----begin included message ---------
>
>
> When I use robust=T, I do not understand how survreg treats the
> weights as sampling weights and arrives at a different standard error
> from lm:
>
>> summary(survreg(Surv(y)~x, dist='gaussian', data=test, weights=rep(2,6),
>> robust=T))$table
>
>                  Value   Std. Err (Naive SE)        z            p
> (Intercept)  0.4000000 0.29426260  0.5219013  1.35933 1.740420e-01
> x            0.8857143 0.08384353  0.1340119 10.56390 4.380958e-26
> Log(scale)  -0.2321528 0.08117684  0.2041241 -2.85984 4.238543e-03
>