[R] Help with Mahalanobis

Gabor Grothendieck ggrothendieck at gmail.com
Sun Jul 10 19:11:15 CEST 2005


I think you could simplify this by replacing everything after the
nObjects = nrow(mds) line with just these two statements.

  f <- function(a,b) mapply(function(a,b)
    (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)

  D2 <- outer(seq(nObjects), seq(nObjects), f)

This also eliminates dependence on gtools and the complexity
of dealing with triangular matrices.

Regards.

Here it is in full:

D2Mah2 = function(y, x) {

  stopifnot(is.data.frame(y), !missing(x))
  stopifnot(dim(y)[1] != dim(x)[1])
  y    = as.matrix(y)
  x    = as.factor(x)
  man  = manova(y ~ x)
  E    = summary(man)$SS[2] #Matrix E
  S    = as.matrix(E$Residuals)/man$df.residual
  InvS = solve(S)
  mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))

  library(gtools)
  nObjects = nrow(mds)

  ### changed part is next two statements
  f <- function(a,b) mapply(function(a,b)
    (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,]), a,b)

  D2 <- outer(seq(nObjects), seq(nObjects), f)
}

#
# test
#
D2M2 = D2Mah2(iris[,1:4], iris[,5])
print(D2M2)




On 7/10/05, Jose Claudio Faria <joseclaudio.faria at terra.com.br> wrote:
> Well, as I did not get a satisfactory reply to the original question I tried to
> make a basic function that, I find, solve the question.
> 
> I think it is not the better function, but it is working.
> 
> So, perhaps it can be useful to other people.
> 
> #
> # Calculate the matrix of Mahalanobis Distances between groups
> # from data.frames
> #
> # by: José Cláudio Faria
> # date: 10/7/05 13:23:48
> #
> 
> D2Mah = function(y, x) {
> 
>   stopifnot(is.data.frame(y), !missing(x))
>   stopifnot(dim(y)[1] != dim(x)[1])
>   y    = as.matrix(y)
>   x    = as.factor(x)
>   man  = manova(y ~ x)
>   E    = summary(man)$SS[2] #Matrix E
>   S    = as.matrix(E$Residuals)/man$df.residual
>   InvS = solve(S)
>   mds  = matrix(unlist(by(y, x, mean)), byrow=T, ncol=ncol(y))
> 
>   colnames(mds) = names(y)
>   Objects       = levels(x)
>   rownames(mds) = Objects
> 
>   library(gtools)
>   nObjects = nrow(mds)
>   comb     = combinations(nObjects, 2)
> 
>   tmpD2 = numeric()
>   for (i in 1:dim(comb)[1]){
>     a = comb[i,1]
>     b = comb[i,2]
>     tmpD2[i] = (mds[a,] - mds[b,])%*%InvS%*%(mds[a,] - mds[b,])
>   }
> 
>   # Thanks Gabor for the below
>   tmpMah = matrix(0, nObjects, nObjects, dimnames=list(Objects, Objects))
>   tmpMah[lower.tri(tmpMah)] = tmpD2
>   D2 = tmpMah + t(tmpMah)
>   return(D2)
> }
> 
> #
> # To try
> #
> D2M = D2Mah(iris[,1:4], iris[,5])
> print(D2M)
> 
> Thanks all for the complementary aid (specially to Gabor).
> 
> Regards,
> --
> Jose Claudio Faria
> Brasil/Bahia/UESC/DCET
> Estatistica Experimental/Prof. Adjunto
> mails:
>  joseclaudio.faria at terra.com.br
>  jc_faria at uesc.br
>  jc_faria at uol.com.br
> tel: 73-3634.2779
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list