[R] Fitting weibull and exponential distributions to left censoring data

Christian Ritz ritz at life.ku.dk
Sun Nov 2 11:22:32 CET 2008

```Hi Göran,

the R package NADA is specifically designed for left-censored data:

http://www.statistik.uni-dortmund.de/useR-2008/slides/Helsel+Lee.pdf

The function cenreg() in NADA is a front end to survreg().

Christian

Göran Broström wrote:
> On Fri, Oct 31, 2008 at 2:27 PM, Terry Therneau <therneau at mayo.edu> wrote:
>>  Use the survreg function.
>
> The survreg function cannot fit left censored data (correct me if I am
> wrong!), neither can phreg or aftreg (package eha). On the other hand,
> if Borja instead wanted to fit left truncated data (it is a common
> mistake to confuse left truncation with left censoring), it is
> possible to use phreg or aftreg, but still not survreg (again, correct
> me if I am wrong).
>
> If instead Borja really wants to fit left censored data, it is fairly
> simple with the aid of the function optim, for instance by calling
> this function:
>
> left <- function(x, d){
>     ## d[i] = FALSE: x[i] is left censored
>     ## d[i] = TRUE: x[i] is observed exactly
>     loglik <- function(param){# The loglihood function
>         lambda <- exp(param[2])
>         p <- exp(param[1])
>         sum(ifelse(d,
>                         dweibull(x, p, lambda, log = TRUE),
>                         pweibull(x, p, lambda, log.p = TRUE)
>                    )
>             )
>     }
>     par <- c(0, 0)
>     res <- optim(par, loglik, control = list(fnscale = -1), hessian = TRUE)
>     list(log.shape = res\$par[1],
>          log.scale = res\$par[2],
>          shape = exp(res\$par[1]),
>          scale = exp(res\$par[2]),
>          var.log = solve(-res\$hessian)
>          )
> }
>
> Use like this:
>
>> x <- rweibull(500, shape = 2, scale = 1)
>> d <- x > median(x) # 50% left censoring, Type II.
>> y <- ifelse(d, x, median(x))
>> left(y, d)
>
> \$log.shape
> [1] 0.707023
>
> \$log.scale
> [1] -0.007239744
>
> \$shape
> [1] 2.027945
>
> \$scale
> [1] 0.9927864
>
> \$var.log
>              [,1]         [,2]
> [1,] 0.0022849526 0.0005949114
> [2,] 0.0005949114 0.0006508635
>
>
>>  There are many different ways to parameterize a Weibull.  The survreg function
>> imbeds it a general location-scale familiy, which is a different
>> parameterization than the rweibull function.
>>
>>> y <- rweibull(1000, shape=2, scale=5)
>>> survreg(Surv(y)~1, dist="weibull")
>> Coefficients:
>> (Intercept)
>>   1.592543
>>
>> Scale= 0.5096278
>>
>> Loglik(model)= -2201.9   Loglik(intercept only)= -2201.9
>>
>> ----
>>
>>  survreg's scale  =    1/(rweibull shape)
>>  survreg's intercept = log(rweibull scale)
>>  For the log-likelihood all parameterizations lead to the same value.
>>
>>  There is not "right" or "wrong" parameterization for a Weibull (IMHO),
>
> Correct, but there are two points I would like to add to that:
> (i) It is a good idea to perform optimisation with a parametrization
> that give no range restrictions.
>
> (ii) It is a good idea to transform back the results to the
> parametrization that is standard in R, for comparative reasons.
>
> See for example the function 'left' above.
>
>> but
>> there certainly is a lot of room for confusion.  This comes up enough that I
>> have just added it as an example in the survreg help page, which will migrate to
>> the general R distribution in due course.
>>
>>  Terry Therneau
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help