[R] Mahalanobis Distance

David L Carlson dcarlson at tamu.edu
Tue Sep 27 23:09:16 CEST 2011


Since you are only looking at the distance between two points, they must
fall on a line so no matter how many values you have for each point, their
dimension is still 1. Mahalanobis distance is a way of measuring distance in
multivariate space when the variables (columns) are correlated with one
another.  In this case, Euclidian distance (which assumes each dimension is
orthogonal to all the others) is inappropriate. With two points and one
dimension, all distance measures are effectively equivalent since they can
be converted to one another by multiplying by an appropriate constant.


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of jorgeA
Sent: Tuesday, September 27, 2011 12:08 PM
To: r-help at r-project.org
Subject: Re: [R] Mahalanobis Distance

Hello David(s),

First of all, thank you for your help.

I was running some tests, and I wish to know if I have correctly understood
your explanation. Well, when I use rbind(), I get the variables binded by
row, and when I use cbind() I get the variables binded by column.

The dist() function, as the help says, "computes and returns the distance
matrix computed by using the specified distance measure to compute the
distances between the rows of a data matrix", so, in that case I use rbind()
(as the help example does).

The mahalanobis() function help says "returns the squared Mahalanobis
distance of all rows in x and the vector mu = center with respect to Sigma =
cov.", so, here again, the calculations are done by row. Using cbind() I get
one result for each row like this:

mahalanobis(testeCbind, center = colMeans(testeCbind), cov=var(testeCbind))

I get as result 15 values (the number of rows).

With dist(), using euclidean and rbind() I get only one value (because is
calculated by row). 

Thinking on that way, mahalanobis distance is not so aproprietad for my kind
of input data. Am I correct? Or is there a way to make the calculation of
mahalanobis of all points and get only one value as the result of how
"distante" the variables (subseries) are?

Thank you all again.

Best regars,
Jorge Aikes Junior

--
View this message in context:
http://r.789695.n4.nabble.com/Mahalanobis-Distance-tp3844960p3848247.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list