[R] memory problem with package mix

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Tue Feb 15 17:24:18 CET 2005


On 15-Feb-05 Delphine.Gille at eleves.polytech-lille.fr wrote:
> Hello,
> I think we have a memory problem with  em.mix.
> 
> We have done:
> 
>>library(mix)
>>Manq <- read.table("C:/.../file.txt")
>>attach(Manq)
>>Manq
>>    V1 V2 V3 V4 .............V27
>> 1  1  1  1  1...........
>> 2  1 NA  3  6
>> 3  1  2  6  2
>> ...
>> ...
>> 300  2  NA  6  2...........
> 
>> Essaimanq <-prelim.mix(as.matrix(Manq),5)
>> test <- em.mix(Essaimanq)
>     error cannot allocated vector of size 535808 KB
>     in addition : warning message reached total allocation of 509MB

Hmm.

According to the above, it seems you might have 5 categorical
variables V1...V5 with at least 6 levels, so since your call to
em.mix does not specify any model restriction (for which you
need to call ecm.mix insead) you may have at least 6^5 = 46656
"cells" for the different combinations of levels. This will
require 46655 parameters for the probabilities of these cells.

For each cell, you have a separate vector of means for the
multivariate normal distribution to be fitted to the (27-5)=22
continuous variables. This requires 22*46656 = 1026432 parameters.

Sub-total: 1073087

Then, as a bit of sugar on the cake, you have the 22*21/2 = 121
parameters for the covariance matrix.

Sub-total: 1073208

Since em.mix does quite complicated things, it is perhaps
not surpising that it demands more than 509MB (corresponding
to about 500 bytes per parameter or, with 8 bytes per number,
about 60 numbers per parameter). Not to mention the 8100
numbers (about 65000 bytes) required for each working copy
of the representation of the data.

In any case, apparently you only have 300*27 = 8100 data,
quite inadequate for this unrestricted model!

Even if you could have allocated enough memory, you would
then have found that the EM fit would not get anywhere.

Suggested solution: think about restricting the number of
parameters in the model, using the parameter "margins" to
ecnm.mix to restrict the number of independent combinations
of categorical levels, and also "design" to specify a simpler
model for the dependence of the continuous variables
on the categoricals (e.g. the matrix corresponding to the
model "~ V1+V2+V3+V4+V5" only introduces 5*6*22=660 new
parameters, namely a simple additive effect of level of each Vi
on the mean of each of the 22 continuous variables).

Hoping this helps,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 15-Feb-05                                       Time: 16:24:18
------------------------------ XFMail ------------------------------




More information about the R-help mailing list