[R] normalmixEM gives widely divergent results.

Wed Jan 27 19:07:01 CET 2016

On Wed, 27 Jan 2016 11:51:07 -0500 John Sorkin <JSorkin at grecc.umaryland.edu> wrote:

> I am running normalmixEM:
> mixmdlscaled <- normalmixEM(data$FCWg)
> summary(mixmdlscaled)
> plot(mixmdlscaled,which=2)
>  
> If I run the program multiple times, I get widely different results:
>  
> > mixmdlscaled <- normalmixEM(data$FCWg)
> number of iterations= 41 
> > summary(mixmdlscaled)
> summary of normalmixEM object:
>           comp 1   comp 2
> lambda 0.0818928 0.918107
> mu     0.6575938 0.740870
> sigma  0.0070562 0.178410
> loglik at estimate:  56.87445 
> > plot(mixmdlscaled,which=2)
> > mixmdlscaled <- normalmixEM(data$FCWg)
> number of iterations= 357 
> > summary(mixmdlscaled)
> summary of normalmixEM object:
>          comp 1    comp 2
> lambda 0.959912 0.0400879
> mu     0.722022 1.0220719
> sigma  0.165454 0.0131391
> loglik at estimate:  53.66051 
> > plot(mixmdlscaled,which=2)
> 
>  
>  
> I understand that when run without specifying various parameters (e.g. mu, or sigma) values are chosen randomly from a normal distribution with center(s) determined from binning the data. 

I don't know what this means or what the mechanics are.

> Despite this, would not one expect the results to be similar? If one is not to expect similar results, how can I get a solution in which I can have confidence? Should I run the program multiple times and take the average of the results? Should I look for the solution with the best log likelihood?  

But if a likelihood has several local maxima wrt its parameters, isn't this what you would expect? I don't know how familiar you are with statistics so maybe I am repeating something that you already know, but a MLE (note the indefinite article) is what is found by the EM or any iterative/root-finding method in the vicinity of its initialization. 

Your best best is to use a package such as EMCluster. If you want to use the above package, you should make several runs and then choose the one which gives a stable solution and the highest loglikelihood value. EMCluster does it for you.

HTH!

Best wishes,
Ranjan

> Thank you,
> John
> John David Sorkin M.D., Ph.D.
> Professor of Medicine
> Chief, Biostatistics and Informatics
> University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
> Baltimore VA Medical Center
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> (Phone) 410-605-7119
> (Fax) 410-605-7913 (Please call phone number above prior to faxing) 
> 
> Confidentiality Statement:
> This email message, including any attachments, is for ...{{dropped:26}}