[R] evaluation of discriminant functions+multivariate homosce dasticity

Dave Andrae daandrae at yahoo.com
Wed Jan 21 16:58:02 CET 2004


I seem to remember, from a course in which I used SPSS for LDA, that
Box's M is an ultra-sensitive test as well and that in almost all
practical applications it's not useful, so Prof. Ripley's comments
apply to that test, too.


HTH,

Dave

--- "Liaw, Andy" <andy_liaw at merck.com> wrote:
> While I don't know anything about Box's M test, I googled around and
> found a
> Matlab M-file that computes it.  Below is my straight-forward
> translation of
> the code, without knowing Matlab or the formula (and done in a few
> minutes).
> I hope this demonstrates one of Prof. Ripley's point: If you really
> want to
> shoot yourself in the foot, you can probably program R to do that for
> you.
> 
> [BTW: I left the original comments largely intact.  The output I get
> from R
> for the example is the same as what is shown in the comments.]
> 
> [BTW #2: I'd imagine the original Matlab code probably could be
> improved a
> bit...]
> 
> Andy
> 
>
=======================================================================
> BoxMTest <- function(X, cl, alpha=0.05) {
>   ## Multivariate Statistical Testing for the Homogeneity of
> Covariance
>   ## Matrices by the Box's M. 
>   ##
>   ## Syntax: function [MBox] = BoxMTest(X,alpha) 
>   ##      
>   ## Inputs:
>   ##     X - data matrix (Size of matrix must be n-by-(1+p);
> sample=column
> 1,
>   ##         variables=column 2:p). 
>   ##     alpha - significance level (default = 0.05). 
>   ## Output:
>   ##     MBox - the Box's M statistic.
>   ##     Chi-sqr. or F - the approximation statistic test.
>   ##     df's - degrees' of freedom of the approximation statistic
> test.
>   ##     P - observed significance level.
>   ##
>   ## If the groups sample-size is at least 20 (sufficiently large),
>   ## Box's M test takes a Chi-square approximation; otherwise it
> takes
>   ## an F approximation.
>   ##
>   ## Example: For a two groups (g = 2) with three independent
> variables
>   ##    (p = 3), we are interested in testing the homogeneity of
> covariances
>   ##    matrices with a significance level = 0.05. The two groups
> have the
>   ##    same sample-size n1 = n2 = 5.
>   ##                                       Group
>   ##                      ---------------------------------------    
>  
>   ##                            1                        2
>   ##                      ---------------------------------------
>   ##                       x1   x2   x3             x1   x2   x3
>   ##                      ---------------------------------------
>   ##                       23   45   15             277  230   63
>   ##                       40   85   18             153   80   29
>   ##                      215  307   60             306  440  105
>   ##                      110  110   50             252  350  175
>   ##                       65  105   24             143  205   42
>   ##                      ---------------------------------------
>   ##
>   ##             Total data matrix must be:
>   ##          X=[1 23 45 15;1 40 85 18;1 215 307 60;1 110 110 50;1 65
> 105
> 24;
>   ##          2 277 230 63;2 153 80 29;2 306 440 105;2 252 350 175;2
> 143 205
> 42];
>   ##
>   ##             Calling on Matlab the function: 
>   ##                MBoxtest(X,0.05)
>   ##
>   ##             Answer is:
>   ##
>   ##  ------------------------------------------------------------
>   ##       MBox         F           df1          df2          P
>   ##  ------------------------------------------------------------
>   ##     27.1622     2.6293          6           463       0.0162
>   ##  ------------------------------------------------------------
>   ##  Covariance matrices are significantly different.
>   ##
>   
>   ##  Created by A. Trujillo-Ortiz and R. Hernandez-Walls
>   ##             Facultad de Ciencias Marinas
>   ##             Universidad Autonoma de Baja California
>   ##             Apdo. Postal 453
>   ##             Ensenada, Baja California
>   ##             Mexico.
>   ##             atrujo at uabc.mx
>   ##  And the special collaboration of the post-graduate students of
> the
> 2002:2
>   ##  Multivariate Statistics Course: Karel Castro-Morales,
>   ##  Alejandro Espinoza-Tenorio, Andrea Guia-Ramirez, Raquel
> Muniz-Salazar,
>   ##  Jose Luis Sanchez-Osorio and Roberto Carmona-Pina.
>   ##  November 2002.
>   ##
>   ##  To cite this file, this would be an appropriate format:
>   ##  Trujillo-Ortiz, A., R. Hernandez-Walls, K. Castro-Morales,
>   ##  A. Espinoza-Tenorio, A. Guia-Ramirez and R. Carmona-Pina.
> (2002).
>   ##  MBoxtest: Multivariate Statistical Testing for the Homogeneity
> of 
>   ##  Covariance Matrices by the Box's M. A MATLAB file. [WWW
> document]. 
>   ##  URL
>
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=273
> 3&objectType=FILE
>   ##
>   ##  References:
>   ## 
>   ##  Stevens, J. (1992), Applied Multivariate Statistics for Social
> Sciences.
>   ##  2nd. ed., New-Jersey:Lawrance Erlbaum Associates Publishers.
> pp.
> 260-269.
>   
>   if (alpha <= 0 || alpha >= 1)
>     stop('significance level must be between 0 and 1')
> 
>   g = nlevels(cl)  ## Number of groups.
>   n = table(cl)    ## Vector of groups-size.
> 
>   N = nrow(X)
>   p = ncol(X)
> 
>   bandera = 2
>   if (any(n >= 20)) bandera = 1
> 
>   ## Partition of the group covariance matrices.
>   covList <- tapply(X, rep(cl, ncol(X)), function(x, nc)
> cov(matrix(x,
> nc=nc)),
>                     ncol(X))
> 
>   deno = sum(n) - g
>   suma = array(0, dim=dim(covList[[1]]))
> 
>   for (k in 1:g) 
>     suma = suma + (n[k] - 1) * covList[[k]]
> 
>   Sp = suma / deno  ## Pooled covariance matrix.
>   Falta=0
> 
>   for (k in 1:g) 
>     Falta = Falta + ((n[k] - 1) * log(det(covList[[k]])))
>   
>   MB = (sum(n) - g) * log(det(Sp)) - Falta  ## Box's M statistic.
>   suma1 = sum(1 / (n[1:g] - 1))
>   suma2 = sum(1 / ((n[1:g] - 1)^2))
>   C = (((2 * p^2) + (3 * p) - 1) / (6 * (p + 1) * (g - 1))) *
>     (suma1 - (1 / deno))  ## Computing of correction factor.
>   if (bandera == 1) {
>     X2 = MB * (1 - C)                ## Chi-square approximation.
>     v = as.integer((p * (p + 1) * (g - 1)) / 2)  ## Degrees of
> freedom.
>     ## Significance value associated to the observed Chi-square
> statistic.
>     P = pchisq(X2, v, lower=TRUE)   
> 
>     cat('------------------------------------------------\n');
>     cat('     MBox     Chi-sqr.         df          P\n')
>     cat('------------------------------------------------\n')
>     cat(sprintf("%10.4f%11.4f%12.i%13.4f\n", MB, X2, v, P))
>     cat('------------------------------------------------\n')
>     if (P >= alpha) {
>       cat('Covariance matrices are not significantly different.\n')
>     } else {
>       cat('Covariance matrices are significantly different.\n')
>     }
>     return(list(MBox=MB, ChiSq=X2, df=v, pValue=P))
>   } else {
>     ## To obtain the F approximation we first define Co, which
> combined to
>     ## the before C value are used to estimate the denominator
> degrees of
>     ## freedom (v2); resulting two possible cases. 
>     Co = (((p-1) * (p+2)) / (6 * (g-1))) * (suma2 - (1 / (deno^2)))
>     if (Co - (C^2) >= 0) {
>       v1 = as.integer((p * (p + 1) * (g - 1)) / 2)  ## Numerator DF.
>       v21 = as.integer(trunc((v1 + 2) / (Co - (C^2))))  ##
> Denominator DF.
>       F1 = MB * ((1 - C - (v1 / v21)) / v1) ## F approximation.
>       ##  Significance value associated to the observed F statistic.
>       P1 = pf(F1, v1, v21, lower=FALSE) 
> 
>  
>
cat('\n------------------------------------------------------------\n')
>       cat('     MBox         F           df1          df2         
> P\n')
>      
> cat('------------------------------------------------------------\n')
>       cat(sprintf("%10.4f%11.4f%11.i%14.i%13.4f\n", MB, F1, v1, v21,
> P1))
> 
=== message truncated ===




More information about the R-help mailing list