[R] superfluous distribution found with mclust

Denis Chabot chabotd at globetrotter.net
Mon Mar 22 18:10:26 CET 2010


Dear R users,

I use mclust to fit a mixture of normal distributions to many datasets. Usually the Mclust function finds 1 or two normal distributions, rarely, 3.

But I hit a strange case today.

my.data <- c(57.96920, 51.79415, 51.20538, 55.53637, 51.64291, 56.61476, 51.28855, 55.56169, 51.85113, 54.03330, 51.37370, 49.48561, 52.41580, 53.51176, 60.49293, 55.77012, 51.59270, 56.29660, 55.90048, 53.05432, 50.87498, 58.47613, 54.60827, 54.16143, 52.94914, 58.89408, 51.17116, 54.16909, 51.94852, 53.29897, 57.21962, 66.94420, 56.65536, 53.38147, 52.79163, 52.55879, 55.54395, 54.33984, 51.79235, 52.93464, 50.03343, 59.04797, 51.85276, 53.16419, 53.27404, 60.08775, 52.96493, 54.15129, 58.53050, 51.74431, 50.67817, 51.22570, 57.60541, 51.32998, 56.73625, 55.99371, 50.41035, 52.79797, 59.75973, 52.03613, 56.59133, 51.66319, 51.06316, 55.57699, 50.12779, 56.04503, 55.75857, 57.55347, 51.48167, 52.22395, 54.96204, 59.58895, 55.49020, 50.50893, 49.97572, 53.26222, 57.10047, 51.25523, 52.38768, 56.42965, 51.83258, 55.40537, 51.60564, 54.68883, 53.48098, 58.47231, 70.15088, 51.68805, 52.82636, 52.97804, 51.90228, 53.49184, 52.24366, 52.36895, 53.26520, 52.27327, 50.85403)

cl <- mclustBIC(my.data)
myModel <- summary(cl, my.data)

Warning message:
In map(out$z) : no assignment to 1

I do not know why this happens, but this confirms that a first distribution was found but no data was assigned to it:

myModel$classification
 [1] 3 2 2 3 2 3 2 3 2 2 2 2 2 2 3 3 2 3 3 2 2 3 2 2 2 3 2 2 2 2 3 4 3 2 2 2 3 2 2 2
[41] 2 3 2 2 2 3 2 2 3 2 2 2 3 2 3 3 2 2 3 2 3 2 2 3 2 3 3 3 2 2 3 3 3 2 2 2 3 2 2 3
[81] 2 3 2 2 2 3 4 2 2 2 2 2 2 2 2 2 2


Furthermore, the first and second distributions have almost the same mean:

myModel$parameters$mean
       1        2        3        4 
52.33903 52.33948 57.14263 68.54754 



Graphically, I don't see a reason for the distribution with mean=52.33903 to be there:


hist(my.data, breaks=99, freq=F, main="", border=grey(0.5))
rug(my.data, ticksize = 0.01, quiet = TRUE)

newx <- seq(from = min(my.data), to = max(my.data), length = 500)
Dens <- dens(modelName = myModel$modelName, data = newx,
			parameters = myModel$parameters)
lines(newx, Dens, col="blue")			


Do you know why I get this first distribution with no member?

Thanks in advance,

Denis Chabot



More information about the R-help mailing list