[R] finding an unknown distribution

Rubén Roa-Ureta rroa at udec.cl
Mon Apr 21 21:43:09 CEST 2008


andrea previtali wrote:
> Hi,
> I need to analyze the influences of several factors on a variable that is a measure of fecundity, consisting of 73 observations ranging from 0 to 5. The
> variable is continuous and highly positive skewed, none of the typical
> transformations was able to normalize the data. Thus, I was thinking in analyzing these data using a generalized linear model where I
> can specify a distribution other than normal. I'm thinking it may fit a
> gamma or exponential distribution. But I'm not sure if the data meets
> the assumptions of those distributions because their definitions are
> too complex for my understanding!

Roughly, the exponential distribution is the model of a random variable 
describing the time/distance between two independent events that occur 
at the same constant rate. The gamma distribution is the model of a 
random variable that can be thought of as the sum of exponential random 
variables. I don't think fecundity data, the count of reproductive 
cells, qualifies as a random variable to be modeled by either of these 
distributions. If the count of reproductive cells is very large, and you 
are modeling this count as a function of animal size, such as length, 
you should consider the lognormal distribution, since the count of cells 
grow multiplicatively (volumetrically) with the increase in length. In 
that case you can model your response variable using glm with 
family=gaussian(link="log").
Rubén



More information about the R-help mailing list