[R] Interval censored Data in survreg() with zero values!

Geraldine Henningsen ghenningsen at email.uni-kiel.de
Mon Jan 12 17:29:24 CET 2009


Hello again,

I studied your suggestion but still I disagree.  You wrote:

"From the way you wrote the problem I assumed 
that there is some number of n "looks" at the subject and then you count them 
up."

But this is not the case. My data is clearly continuous quantities and no discrete choices. I know nothing about the underlying choice process, the only thing I know is the final share of one of three regimes. So sorry for the bad description of the problem.
So I stick with my censored data model. Still the hint about the p-values is very helpful because I actually ran into this problem. So thank you for the hint.

Best, Geraldine 



Terry Therneau schrieb:
> Apologies -- you are being more subtle than I thought.  Nevertheless, I think 
> that the censoring language isn't quite right.
>
>   You are thinking of a hierarchical model:
>   
>     z ~ N(Xb, sigma), where Xb is the linear predictor, whatever covariates you 
> think belong in the model.  Whether the distribution should be Gaussian or 
> somthing else depends not on the overall distribution of z, but on distribution 
> of (z | Xb).  We could have a skewed predictor leading to skewed z, even if the 
> distribution about any given expectation is symmetric.
>     
>     y = F(z) is what you observe.  The classic tobin model is y= max(0,z), which 
> does lead to censored data. 
>     
>     In your case y_i = Binomial(n_i, p_i = H(z)).  Note a binomial is k heads 
> out of n tries with a coin of probability p, a "Bernouli" is a binomial 
> restricted to a single coin flip.  From the way you wrote the problem I assumed 
> that there is some number of n "looks" at the subject and then you count them 
> up.  Note that var(y) = n p (1-p)
>     
>     H describes how the probability changes with z.  In biology we very rarely 
> use H(z)= max(min(z,1),0) because it gives a hard threshold, and the probability 
> of nearly anything doesn't go all the way to zero or one.  
>     
>     If H were as above and 
>     	var(y) = constant and
>     	n is sufficiently large so that Binomial dist is approx Gaussian and
>     	var(y |p) << var(z| Xb)
>
> then your y will fit a censored Gaussian.  Since at least the second is false, 
> it doesn't.  
>
>    A censored model may still be an ok first cut at fitting the data, but I 
> would be suspicious of variance estimates and particularly of any p-values.  The 
> bootstrap could help that.
>    
>    	Terry T.
>    	 
>
>
>




More information about the R-help mailing list