[R] Variance estimates for survreg vs. lm

Mon Jul 6 14:30:54 CEST 2015

The difference is that survreg is using a maximum likelihood estimate (MLE) of the 
variance and that lm is using the unbiased (MVUE) estimate of variance.  For simple linear 
regression, the former divides by "n" and the latter by "n-p".  The difference in your 
variances is exactly n/(n-p) = 10/8.

With censored data the MLE estimate is still clear, but what exactly "n" is is not so 
obvious (does a censored datum count as a whole observation?), so a simple (n-p)/n 
variance correction is also not obvious.  I would not be surprised if someone, somewhere 
has hammered out a correction for "unbiased" variance; I've never looked for it.  I'm not 
convinced that it is worthwhile though.

Terry Therneau

On 07/04/2015 05:00 AM, r-help-request at r-project.org wrote:
> I would like help understanding why a survival regression with no censored
> data-points does not give the same variance estimates as a linear model
> (see code below).
>
> I think it must be something to do with the fact that the variance is an
> actual parameter in the survival version via the log(scale), and possibly
> that different assumptions are made about the distribution of the variance.
> But I really don't know, I'm just guessing.
>
> The reason I ask is because I am moving a process, that has always been
> modelled using a linear model, to a survival model (because there are
> sometimes a few censored data points). In the past, the censored data
> points have been treated as missing which imparts bias. The variance of the
> estimates in this process is key, so I need to know why they are changing
> in this systematic way?!
>
>
>
> library(survival)
>
> ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
> ctl.surv <- Surv(ctl)
>
> trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
>
> lmod <- lm     (ctl      ~ trt                )
> smod <- survreg(ctl.surv ~ trt,dist="gaussian")
>
> coef(lmod)
> coef(smod) # same
>
> vcov(lmod)
> vcov(smod) # smod is smaller
>
> diag(vcov(lmod))     /
> diag(vcov(smod))[1:2]  # 1.25 == 0.5*(n/(n-1))
>
> ( summary(lmod)$coef [   ,"Std. Error"] /
>    summary(smod)$table[1:2,"Std. Error"]   )^2    # 1.25 = 0.5*(n/(n-1))