[R] fitting mixture of gaussians using emclust() of mclust package

Christian Hennig fm3a004 at math.uni-hamburg.de
Wed Aug 1 11:16:28 CEST 2001


On Tue, 31 Jul 2001, Jonathan Qiang Li wrote:

> Hi,
> 
> Has someone tried to use mclust package function emclust() to fit a
> mixture of gaussian model for a relatively large dataset?
> By "large", I specifically have in mind a data set with 50,000
> observations and 23 dimensions. My machine has 750M memory and 500M swap
> space. When I tried to use emclust on the dataset, I consistently get
> messages such as "Error: cannot allocate vector of size 1991669 Kb". In
> other words does this mean that R is trying to allocate almost 2000Mb
> space? Should this be considered abnormal?

No. I recently talked to A.E.Raftery, one of the designers of the Splus 
original, and he said that there are indeed problems with datasets of more
than, say, 10000 observations. He said that it is the number of observations
that matters, not the dimension. The main problem is, according to him, 
the hierarchical routine which leads to the initial partition. He suggests
to take a random subsample of size 100-1000 and to generate initial starting
parameters from the subsample. I cannot tell you the details, because I
have not tried this until now. But the principle is that you can tell
emclust/mclust somehow, how the starting values are generated and the default,
the memory intensive hierarchical clustering, must be replaced by some fixed
starting configuration obtained from a subsample. 

Another hint is that for high dimensions it is not advisable to calculate
the "VVV"-model because of the high probability for spurious local maxima of
the likelihood. 
 
Hope that helps,
Christian


***********************************************************************
Christian Hennig
University of Hamburg, Faculty of Mathematics - SPST/ZMS
 (Schwerpunkt Mathematische Statistik und Stochastische Prozesse,
 Zentrum fuer Modellierung und Simulation)
Bundesstrasse 55, D-20146 Hamburg, Germany
Tel: x40/42838 4907, privat x40/631 62 79
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag.de


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list