[R] Interval censored Data in survreg() with zero values!

Geraldine Henningsen ghenningsen at email.uni-kiel.de
Tue Dec 30 18:39:41 CET 2008

> --begin included -----
> My endogenous variable is not a time depending variable but percentages
> which naturally are censored in the interval [0,100]. Unfortunately many
> data points are 0 or 100 exactly. The rest of the data is asymmetrically
> distributed. So I would like to apply a two-limit tobit, regressing the
> percentage
> (endogenous variable) on several explanatory variables.  
> --- end included ----
>   Censoring is a limit in the observation process: right censored at 100 means 
> that "the true y value is > 100, but we did not observe the exact value".  You 
> have binomial data with 0 <= y <= 100, which is not a constraint on the 
> observation process.
>    You should be using glm with a binomial family.  
>    	Terry T

Sorry for being so cumbersome, but I don't see why my data shouldn't be
censored but be binomial instead.
The classical Tobit (left censored at zero) example is household
expenditure on durable goods, which naturally has a high peak on zero as
not each item is bought in every period. As expenditure can't be
negative, the variable is left censored. 
In my case, the (observed) percentage of A must be between 0 and 100. We
suppose that each individual has a specific unobservable tendency (y*)
to do A. If the tendency to do A is very low (y*<=0), we observe that
she does not do A (y=0); if the tendency is very high (y*>=100), we
observe that she is only doing A (y=100); if the tendency is mediocre
(0<y*<100), we observe that she is doing some A (y=y*, 0<y<100).
I don't see a binomial distribution in the data. I don't see where there
is a Bernoulli trial in the data as y can take more than two values and
is even a continuous variable.
As said before I'm no big statistician so I would be grateful if you
could enlighten me.


More information about the R-help mailing list