[R] A Few MCLUST Questions

Murray Jorgensen maj at stats.waikato.ac.nz
Mon Jun 14 04:49:15 CEST 2004


I can answer for MCLUST specifically, but in general mixture modelling 
terms it is easier to think of a reasonable initial clustering of the 
data from which the M step will quickly produce initial parameter 
estimates, than to pick a large number of initial parameters values out 
of the air. (Perhaps you may use a random grouping to start things off 
if nothing else comes to mind.) Usually if you try to do this you will 
pick parameters that make some data values very improbable leading to 
numerical difficulties in the M-step.

On the other hand you may have a good set of parameter values from a 
previously-fitted data set and you have a new, but similar set of data, 
perhaps from a different time-period or location. Then it will make 
sense to start off from the parameter values that you have.

Don't worry about the software - it should be just as easy for it to 
begin at either the E- or the M- step - it is you own intentions and 
convenience that matter.

Murray Jorgensen


KKThird at Yahoo.Com wrote:
> Hello everyone. I have a few MCLUST questions and I was hoping someone could help me out. If you’re an MCLUST user, they will likely be pretty easy to answer. Thanks in advance for any help.
> 
> Ken 
> 
>  
> 
>    What are the pros/cons of starting a finite mixture model at the "m" step versus the "e" step (where "m" is the maximization step and "e" is the expectation step of the EM algorithm)? In particular, are there any reasons for using em(modelName=XXX) versus me(modelName=XXX). Other than MCLUST, I’ve not seen a finite mixture model "program" give such an option. Would it make sense to fit both models and take the one with the largest log likelihood?
> 
>  
> 
>    Rather than the hc() function performing cluster analysis for all of G possible clusters, can it be set to only perform a specified number (e.g., set so G=2 only). Although a minimum number of clusters can be specified, there doesn’t seem to be any way to limit the number of clusters. I want to do a simulation for a fixed number of components, and thus I would like to avoid the unnecessary computations. 
> 
>  
> 
>    Is there any difference between hc(modelName=VVV) and hcVVV or hc(modelName=EEE) and hcEEE, etc.? Likewise, are there any differences between mstep(modelName=VVV) and mstepVVV or mstep(modelName=EEE) and mstepEEE, etc. If not, why do the same functions have different names?
> 
> 
> 		
> ---------------------------------
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 

-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    +64 7 849 6486 home    Mobile 021 1395 862




More information about the R-help mailing list