[R] Homogeneity of variance tests between more than 2 sample

Peter Dalgaard p.dalgaard at biostat.ku.dk
Mon Dec 20 01:11:20 CET 2004


(Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> writes:

> For non-normal data, there's something of a question as to
> what is meant (or, perhaps more accurately, what is intended
> to be meant) by homogeneity of variance, as a test preliminary
> to an analysis of variance.

Yes... If you use the test as a preliminary to an ANOVA, which largely
depends on second order properties, I think it is reasonable to assume
that you really mean to compare the variances. It's always been a
mystery to me why SPSS prefers the Levene test, which tests whether
the mean absolute deviation is identical, which is a pretty obviously
not the same thing, unless you assume something like the distributions
being scaled versions of eachother.

The Tukey procedure that you outline below would seem to have
something of the same issue: If two distributions have the same
variance but different kurtosis, you'll get the heavy-tailed one
occurring before the light-tailed one in that scheme, then a region
where the light-tailed distribution dominates and finally a region
where the heavy tailed distribution dominates again (think of a
uniform distribution and a normal distribution with the same
variance). It is hard to tell whether the M-W test is biased one way
or the other, but it probably will not have a M-W distribution.

Notice also, btw, that R has several dispersion tests in standard
package "stats", including fligner.test() and ansari.test().
 
> It is possible to consider distribution-free approaches to
> this mind of question.
> 
> One of Tukey's sneakiest inventions was the application of
> the Mann-Whitney test (usually seen as a test of identity
> of distribution against location-shift types of alternative,
> more accurately against alternatives like "P(X<u) > P(Y<u)")
> to test similarity of dispersion.
> 
> The trick: given X1 , ... , Xm and Y1 , ... , Yn, pool them
> and sort the result as Z1 < Z2 < ... < ZN where N = m + n.
> 
> Now take the Z's in the order
> 
>   Z[1] , Z[N] , Z[2] , Z[N-1] , Z[3] , Z[N-2] , ....
> 
> i.e. work inwards from the ends, alternately from each end.
> 
> Note, as you proceed, whether each Z is an X or a Y.
> You thus get a sequence of Xs and Ys. Then sum the number
> of pairs (X,Y) in this sequence where the X occurs earlier
> than the Y.
> 
> This sum, under the null hypothesis of identity of distribution,
> has the Mann-Whitney distribution (just like its usual version),
> and it is sensitive to differences of dispersion (e.g. if the
> distribution of X is more dispersed than the distribution of Y,
> then the Xs will be found earlier in the sequence since they
> lie further out than the Ys and so will be counted in first
> by the above method).
> 
> No doubt, just as there are distribution-free extensions of
> procedures like Mann-Whitney to several samples ("nonparametric
> ANOVA"), so such a procedure could be applied to test equality
> of "dispersions" for several samples, and no doubt it has been
> done.
> 
> However, I've not made use of such a procedure myself, so I
> have to leave it to others to report details.

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list