[R] adding a method to the dist function

Giampiero Salvi giampi at speech.kth.se
Mon May 3 14:20:55 CEST 2004


On Mon, 3 May 2004, Prof Brian Ripley wrote:

> On Mon, 3 May 2004, Giampiero Salvi wrote:
>
> > On Mon, 3 May 2004, Prof Brian Ripley wrote:
> >
> > > dist() compares pairs of rows in the x matrix.  How can they have `means
> > > and covariances'? -- you have a sample of size one from each of two
> > > populations.
> > >
> > > It seems that (Gaussian) Bhattacharyya is more like mahalanobis().
> >
> > I had planned to use mean vectors and covariance matrices I computed
> > over N groups of data samples as input to dist, like this
> >
> > mu_1_1 mu_1_2 ... mu_1_M cov_1_1_1 cov_1_1_2 ... cov_1_M_M
> > mu_2_1 mu_2_2 ... mu_2_M cov_2_1_1 cov_2_1_2 ... cov_2_M_M
> > ...
> > mu_N_1 mu_N_2 ... mu_N_M cov_N_1_1 cov_N_1_2 ... cov_N_M_M
> >
> > where N is the number of groups and M the dimension.
> >
> > I agree that it would be better to use a new function (similar to
> > mahalanobis), as the function dist in all the other cases uses raw
> > data samples, and my interpretation of the input data might be
> > confusing. The reason why I though of dist is that bhattacharyya is
> > a symmetrical distance, and the result fits well the dist class.
> >
> > One way to solve this, if you agree, would be to write a new function
> > bhattacharyya() that returns a dist object.
>
> So you would be computing distances for groups of rows.  That needs a
> different interface from dist().

What I meant is that I compute the means and covaraiances before I use
dist. Then I compute the distance between every row with the same interface
as in the current dist. The whole point is that each row in my case does not
represent a data point, but already a mean vector (and covariance matrix) over
a number of data points.

A difference with mahalanobis is that mahalanobis computes the distance
between a data point (or a number of data points) and a distribution, while
bhattacharyya computes the distance between pairs of distributions (in this
respect it is closer to dist in the sense that the two objects involved in
the computation are of the same kind).

Giampiero

>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
>




More information about the R-help mailing list