[R] Fitting 3 beta distributions

Sun Oct 2 14:44:41 CEST 2011

On Sat, 1 Oct 2011, Nitin Bhardwaj wrote:

> Hi,
> I want to fit 3 beta distributions to my data which ranges between 0 and 1.
> What are the functions that I can easily call and specify that 3 beta
> distributions should be fitted?
> I have already looked at normalmixEM and fitdistr but they dont seem to be
> applicable (normalmixEM is only for fitting normal dist and fitdistr will
> only fit 1 distribution, not 3). Is that right?

>From your description above, I guess that (a) you want to fit a _mixture_ 
of 3 beta distributions, and (b) have tried to use "mixtools" and "MASS" 
so far.

Based on these assumptions: fitdistr() does not fit mixture models. 
"mixtools" does fit mixtures and the accompanying paper has an example 
where a nonparametric model is applied to mixtures of beta distributions. 
Furthermore, the "betareg" package has a function betamix() which can fit 
mixtures of beta regression models (including the special case of no 
covariates).

Both "mixtools" and "betareg" have been published in JSS, as indicated 
when calling citation("mixtools") and citation("betareg"):
http://www.jstatsoft.org/v32/i06/
http://www.jstatsoft.org/v34/i02/

The latter does not yet contain the betamix() function. As an example, one 
can use the artificial data generated in Section 5.2:
   set.seed(123)
   y1 <- c(rbeta(150, 0.3 * 4, 0.7 * 4), rbeta(50, 0.5 * 4, 0.5 * 4))
   y2 <- c(rbeta(100, 0.3 * 4, 0.7 * 4), rbeta(100, 0.3 * 8, 0.7 * 8))
   d <- data.frame(y1, y2)
   bm1 <- betamix(y1 ~ 1 | 1, data = d, k = 2)
   bm2 <- betamix(y2 ~ 1 | 1, data = d, k = 2)
where one should note that compared to R's parametrization of the beta 
distribution two transformations are employed: From shape1/shape2 to 
mu/phi and then adding logit/log link functions.

> Also, my data has 26 million data points. What can I do to reduce the
> computation time with the suggested function?

I think all functions above will have problems with 26 million 
observations directly. One alternative - if the fitting function 
takes weights - would be to use a representative sample or computing 
weights on a possibly coarsened grid.

hth,
Z

> thanks a lot in advance,
> eagerly waiting for any input.
> Best
> Nitin
>
> -- 
> ??I+I??
>
> 	[[alternative HTML version deleted]]
>
>