[R] Compare two normal to one normal

Wed Sep 23 04:45:47 CEST 2015

On 23/09/15 13:39, John Sorkin wrote:

> Charles, I am not sure the answer to me question, given a dataset,
> how can one compare the fit of a model of the fits the data to a
> mixture of two normal distributions to the fit of a model that uses a
> single normal distribution, can be based on the glm model you
> suggest.
>
> I have used normalmixEM to fit the data to a mixture of two normal
> curves. The model estimates four (perhaps five) parameters: mu1, sd^2
> 1, mu2, sd^2, (and perhaps lambda, the mixing proportion. The mixing
> proportion may not need to be estimated, it may be determined once
> once specifies mu1, sd^2 1, mu2, and sd^2.) Your model fits the data
> to a model that contains only the mean, and estimates 2 parameters
> mu0 and sd0^2.  I am not sure that your model and mine can be
> considered to be nested. If I am correct I can't compare the log
> likelihood values from the two models. I  may be wrong. If I am, I
> should be able to perform a log likelihood test with 2 (or 3, I am
> not sure which) DFs. Are you suggesting the models are nested? If so,
> should I use 3 or 2 DFs?

You are quite correct; there are subtleties involved here.

The one-component model *is* nested in the two-component model, but is 
nested "ambiguously".

(1) The null (single component) model for a mixture distribution is 
ill-defined.  Note that a single component could be achieved either by 
setting the mixing probabilities equal to (1,0) or (0,1) or by setting
mu_1 = mu_2 and sigma_1 = sigma_2.

(2) However you slice it, the parameter values corresponding to the null 
model fall on the *boundary* of the parameter space.

(3) Consequently the asymptotics go to hell in a handcart and the 
likelihood ratio statistic, however you specify the null model, does not 
have an asymptotic chi-squared distribution.

(4) I have a vague idea that there are ways of obtaining a valid 
asymptotic null distribution for the LRT but I am not sufficiently 
knowledgeable to provide any guidance here.

(5) You might be able to gain some insight from delving into the 
literature --- a reasonable place to start would be with "Finite Mixture 
Models" by McLachlan and Peel:

@book{mclachlan2000finite,
   title={Finite Mixture Models, Wiley Series in
          Probability and Statistics},
   author={McLachlan, G and Peel, D},
   year={2000},
   publisher={John Wiley \& Sons, New York}
}

(6) My own approach would be to do "parametric bootstrapping":

* fit (to the real data) the null model and calculate
   the log-likelihood L1, any way you like
* fit the full model and determine the log-likelihood L2
* form the test statistic LRT = 2*(L2 - L1)
* simulate data sets from the fitted parameters for the null model
* for each such simulate data set calculate a test statistic in the
   foregoing manner, obtaining LRT^*_1, ..., LRT^*_N
* the p-value for your test is then

   p = (m+1)/(N+1)

   where m = the number of LRT^*_i values that greater than LRT

The factor of 2 is of course completely unnecessary.  I just put it in 
"by analogy" with the "real", usual, likelihood ratio statistic.

Note that this p-value is *exact* (not an approximation!) --- for any 
value of N --- when interpreted with respect to the "total observation
procedure" of observing both the real and simulated data.  (But see 
below.) That is, the probability, under the null hypothesis, of 
observing a test statistic "as extreme as" what you actually observed is 
*exactly* (m+1)/(N+1).  See e.g.:

@article{Barnard1963,
author = {G. A. Barnard},
title  = {Discussion of ``{T}he spectral analysis of point processes'' 
by {M}. {S}. {B}artlett},
journal = {J. Royal Statist. Soc.},
series  = {B},
volume  = {25},
year = {1963},
pages = {294}
}

or

@article{Hope1968,
author =  {A.C.A. Hope},
title =  {A simplified {M}onte {C}arlo significance test procedure},
journal =  {Journal of the Royal Statistical Society, series {B}},
year =  1968,
volume = 30,
pages = {582--598}
}

Taking N=99 (or 999) is arithmetically convenient.

However I exaggerate when I say that the p-value is exact.  It would be 
exact if you *knew* the parameters of the null model.  Since you have to 
estimate these parameters the test is (a bit?) conservative.  Note that 
the conservatism would be present even if you eschewed the "exact" test 
and an "approximate" test using a (very) large value of N.

Generally conservatism (in this context! :-) ) is deemed to be no bad thing.

cheers,

Rolf Turner

P. S.  I think that the mixing parameter must *always* be estimated. 
I.e. even if you knew mu_1, mu_2, sigma_1 and sigma_2 you would still 
have to estimate "lambda".  So you have 5 parameters in your full model. 
  Not that this is particularly relevant.

R. T.

-- 
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276