[R] Error distribution question
    Peter Dunn 
    dunn at usq.edu.au
       
    Fri Mar  9 00:44:40 CET 2007
    
    
  
> > I was wondering if somebody could offer me some advice on which
> > error distribution would be appropriate for the type of data I have.
> > I'm studying what continuous predictor variables such as grooming
> > received, rank, etc. affect the amount of grooming given. This
> > response variable is continuous with many zeros, and so positively
> > skewed.
>
> This kind of variable is very common in prospecting (oil, mining)
> industries, and also in medical research. It's neither continuous
> nor discrete, because of the weight on zero. Basically, it is a
> combination of _two_ variables:
>
> X: a Bernoulli trial, such that p(X = 0) = 1 - p (failure) and
>    p(X = 1) = p (success)
>
> Y: the continous variable that represents numerically the success
>
> So, we have the final variable as X * Y.
Indeed, the Tweedie distribution may be just what you are 
after.
> I realized in the Tweedie help page that one can use a specific response
> distribution  (Normal, Poisson, Compound Poisson, etc) by setting the
> variance power =  to a specific number. I'm a beginner, so I really don't
> follow then,  
This sounds like you have the  tweedie  package.
And yes, the variance.power tells you which distribution you have.
Tweedie distributions have a variance of the form var[Y] = phi * mu^p
for some variance.power  p.  (Note Tweedie distns belong to the
exponential family, so can be used in the generalized linear model
framework.)
The mixed distributions you talk about (continuous, plus a positive
mass at zero) correspond to tweedie distributions with 1 < p < 2.
(p=2 is the gamma; p=0 is Normal; p=3 is inverse Gaussian; p=1
and phi=1 is Poisson).
> which response distribution to use (i.e. what variance power) that would 
> be appropriate for continuous response data with many zeros. 
If you want to use a tweedie distn in practice, you first need to know
*which* Tweedie distribution you need; that is, what value of p is
appropriate.  To do that, use the  tweedie.profile function in
package  tweedie.  tTat will tell you what value of p is approprioate
for your data.  For the sake of an example, suppose you wish to fit
a model something like  Y ~ x1  + x2; use  tweedie.profile
and you get p = 1.6:
tweedie.profile(Y ~ x1 + x2, p.vec=seq(1.1, 1.9, length=10), 
	do.plot=TRUE)
Then, you can fit the appropriate generalized linear model if you wish
as follow, using package  statmod:
glm( Y ~ x1 + x2, family=tweedie(variance.power=1,.6, link.power=0)
(link.power=0 means a log, and is a commonly used link.)
Hope that's of some help.
P.
-- 
Dr Peter Dunn  |  dunn <at> usq.edu.au
Faculty of Sciences, USQ; http://www.sci.usq.edu.au/staff/dunn
Aust. Centre for Sustainable Catchments: www.usq.edu.au/acsc
This email (including any attached files) is confidential an...{{dropped}}
    
    
More information about the R-help
mailing list