[R] problem with da.mix

(Ted Harding) Ted.Harding at nessie.mcc.ac.uk
Wed Feb 16 21:37:15 CET 2005


On 16-Feb-05 Stephanie.Tomczak at eleves.polytech-lille.fr wrote:
> We use the mix package and we have a problem with the DA
> function. We aren't sure, but it's maybbe a memory problem.
> 
> We have done:
>> Ent<--read.table("C:/.../File.txt")
>> attach(Ent)
>> Ent 
>     V1  V2   V3  V4 ... V16  V17
> 1    1   1   2   6      18   18 
> 2    1   1   1   NA     14   17
> 3    1   1   2   1      16   14
> ....
> 199  2   1   NA  7      19   18
> 200  2   1   3   2      14   17
> 
>> EntPrelim<-prelim.mix(as.matrix(Ent),9)
>> EntEM<-em.mix(EntPrelim,maxits=500)
>> rngseed(1234567)
>> EntDA<-da.mix(EntPrelim, EntEM, steps=100, showits=TRUE)
>  Steps of data Augmentation:
> 1... Error in da.mix(EntPrelim, EntEM, steps=100; showits=TRUE):
>     Improper posterior--empty cells

Dear Stéphanie,

This problem is closely related to the problem reported yesterday
by Delphine Gille from your same institution:

>> From: Delphine.Gille at eleves.polytech-lille.fr
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] memory problem with package mix
>> Date: Tue, 15 Feb 2005 15:23:08 +0100
>> 
>> Hello,
>> 
>> I think we have a memory problem with  em.mix.
>> 
>> We have done:
>>
>> >library(mix)
>> >Manq <- read.table("C:/.../file.txt")
>> >attach(Manq)
>> >Manq
>> >    V1 V2 V3 V4 .............V27
>> > 1  1  1  1  1...........
>> > 2  1 NA  3  6
>> > 3  1  2  6  2
>> > ...
>> > ...
>> > 300  2  NA  6  2...........
>> 
>> > Essaimanq <-prelim.mix(as.matrix(Manq),5)
>> > test <- em.mix(Essaimanq)
>>     error cannot allocated vector of size 535808 KB
>>     in addition : warning message
>>                   reached total allocation of 509MB


The reason is almost certainly the same fact that I pointed
out in my reply to Delphine: you have 9 categorical variables,
each necessarily at at least 2 levels (and in your case at
least one has >=3 levels and at least one has >=6 levels)
so you have at least (2^7)*3*6 = 2304 cells (possibly many
more, depending on the numbers of levels in the variables)
in your unrestricted model for the categorical variables
(as implied by your usage of em.mix and da.mix).

With only 200 rows of data, there will (even if it is only
2304 cells) be at least 2104 of them empty (i.e. with no
data falling in them). Therefore, given the improper Dirichlet
prior which da.mix uses by default, you will almost certainly
end up with an improper posterior distribution as a result of
your many empty cells, which is just what your error message
is telling you.

With so few data, you need to severely restrict the level of
interaction allowed for the categorical variables (and use
ecm.mix instead of em.mix, dabipf.mix instead of da.mix).

In the best possible case (7 variables at 2 levels, one at 3,
one at 6) implied by your data excerpt above, you need
7 + 2 + 5 = 14 parameters at a minimum (no-interaction or
complete-independence model). If you admit the first-order
(2-factor) interactions as well, you need 84 parameters
(I hope I have calculated this right!). Going to 2nd-order
(3-factor) will surely take you over your data size of 200
(I haven't worked this one out: maybe there's a snappy R
function for this sort of thing!). But if your variables
have more levels than the minimum I have assumed (based
on your data excerpt) then the situation will rapidly
get much worse.

Another approach might be to consider using an informative
("proper") prior distribution for the Dirichlet probabilities,
but unless you are very careful you risk adopting something
which is not realistic for your problem. You can do this with
both da.mix with em.mix (provided em.mix works with your sparse
data, which it didn't for Delphine) and da.bipf.mix with ecm.mix.

See also the explanations in "?da.mix" and "?dabipf.mix",
section "Details", which refer to just the kind of problem
you are having.

Hoping this helps,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Feb-05                                       Time: 20:37:15
------------------------------ XFMail ------------------------------




More information about the R-help mailing list