[R] Error distribution question

Thu Mar 8 12:19:50 CET 2007

Cristina wrote:
>
> I was wondering if somebody could offer me some advice on which 
> error distribution would be appropriate for the type of data I have. 
> I'm studying what continuous predictor variables such as grooming 
> received, rank, etc. affect the amount of grooming given. This 
> response variable is continuous with many zeros, and so positively 
> skewed.
>
This kind of variable is very common in prospecting (oil, mining)
industries, and also in medical research. It's neither continuous
nor discrete, because of the weight on zero. Basically, it is a 
combination of _two_ variables:

X: a Bernoulli trial, such that p(X = 0) = 1 - p (failure) and
   p(X = 1) = p (success)

Y: the continous variable that represents numerically the success

So, we have the final variable as X * Y.

For example, if you are going do model the economic value of
a possible oil field, a potential gold mine, or a new experimental
drug, you must assing a non-zero (1-p) to the possibility that
the oil field has no economic value, the mine has no gold, or
the new drug has so many collateral effects that no g*vernment
in the world (except maybe ... name here your favourite) will
allow it. Then you have to estimate the return in the case of
success.

Alberto Monteiro