[R] Discrepant lm() and survreg() standard errors with weighted fits

Therneau, Terry M., Ph.D. therneau at mayo.edu
Tue Feb 25 17:50:27 CET 2014



On 02/25/2014 05:00 AM, r-help-request at r-project.org wrote:
> Hi,
>
> I have some measurements and their uncertainties.  I'm using an
> uncensored subset of the data for a weighted fit (for now---I'll do a
> fit to the full, censored, dataset when I understand the results).
>
> survreg() reports a much smaller standard error for the model
> parameter than lm(), but only when I use weights.  Am I missing
> something?  Here is what I'm doing:

Survreg treats weights as case weights, and lm treats them as sampling weights.
Here is a simple example.  Data set test2 has two copies of every obs in data set test.

> test <- data.frame(x=1:6, y=c(1,3,2,4,6,5))
> test2 <- test[c(1:6, 1:6),]

> summary(lm( y ~ x, data=test))$coef
              Estimate Std. Error   t value   Pr(>|t|)
(Intercept) 0.4000000  0.9039595 0.4424977 0.68100354
x           0.8857143  0.2321154 3.8158362 0.01884548

> summary(lm( y~x, data=test2))$coef
              Estimate Std. Error   t value    Pr(>|t|)
(Intercept) 0.4000000  0.5717142 0.6996503 0.500096805
x           0.8857143  0.1468027 6.0333668 0.000126369

As expected, the standard error has decreased by a factor of sqrt(2)
Now fit the model using case weights:

> summary(lm( y~x, data=test, weight= rep(2,6)))$coef
              Estimate Std. Error   t value   Pr(>|t|)
(Intercept) 0.4000000  0.9039595 0.4424977 0.68100354

  Notice that the answer matches the first run with data set test.  Repeat this experiment 
with survreg, and you will find that the weighted run matches data test2.  When using the 
robust variance, survreg treats weights as sampling weights, not case weights.

  What is the "right" behavior?  Neither or both: the writer of the routine simply makes a 
choice and sticks with it.  If you really care about this read up on the survey package 
which cares about this type type of issue, in detail, and does it right.  An intermediate 
step is to use a software system (stata for example) that explicitly supports more than 
one kind of weight.




More information about the R-help mailing list