[R] Interval censored Data in survreg() with zero values!

Geraldine Henningsen ghenningsen at email.uni-kiel.de
Fri Dec 26 20:38:32 CET 2008


Hello again,

thank you very much for your help so far.

To be more specific, I generate a simplified data set that is similar to
my real world data:

set.seed( 123 )
data <- data.frame( x = runif( 200 ), y = NA )
for( i in 1:200 ){
   data$y[ i ] <- rweibull( 1, 1, 70 + 10 * data$x[ i ] ) - 30
}
data$y[ data$y < 0 ] <- 0
data$y[ data$y > 100 ] <- 100

Applying an interval censored tobit model based on the normal
distribution works:
estNorm <- tobit( y ~ x, left = 0, right = 100, data = data )

Since my data are obviously not normally distributed, I tried the
Weibull distribution, but this does not work (as I wrote before).
estWeibull <- tobit( y ~ x, left = 0, right = 100, dist = "weibull",
data = data )

I have tried to implement Terry's suggestion.
>   [...]  Using Surv(t1, t2, type='interval2'),  you can have 
>     a left censored observation where time of event < t: represented as (NA, t)
>     a right censored observation where time of event >t: represented as (t, NA)
>     an interval censored observations t1<=time <= t2   : represented as (t1,t2)    
>   
estWeibull2 <- survreg( Surv( ifelse( y == 0, NA, y ), ifelse( y == 100,
y, NA), type = "interval2" ) ~ x, data = data )

Is this correct?

My endogenous variable is not a time depending variable but percentages
which naturally are censored in the interval [0,100]. Unfortunately many
data points are 0 or 100 exactly. The rest of the data is asymmetrically
distributed. So I would like to apply a two-limit tobit, regressing the
percentage
(endogenous variable) on several explanatory variables.  

Best Geraldine



More information about the R-help mailing list