[R] daisy() for gower distance calculation

Gavin Simpson gavin.simpson at ucl.ac.uk
Mon Nov 20 17:38:13 CET 2006


On Mon, 2006-11-20 at 09:35 -0400, Tyler Smith wrote:
> On Mon, Nov 20, 2006 at 10:16:07AM +0100, Martin Maechler wrote:
> > 
> > - daisy() is in Recommended package cluster which is part of every
> > 	R installation, so why not try it first?
> 
> This has been suggested to me before, and I really should investigate
> this more fully. My reluctance is due entirely to social
> factors. Plant taxonomists in general (at least the ones on my
> committee!) are familiar with the Gower coefficient, and it is well
> described in Legendre and Legendre 1998, a reference that is also well
> known to my peers - especially those of us in Montreal. Using daisy
> would mean finding and working through one more reference to justify
> another departure from standard (if dated) practice.

Hmmm, I don't get that. As far as I can see, the Legendre & Legendre
(L&L) teaching/telling of Gower's distance is exactly what is
implemented in daisy() - the handling of different types of variables.
daisy() goes further than L&L in supporting different ratio-type
variables.

If in R, daisy() implements most closely Gower's (1971) similarity
(expressed as a distance) why shouldn't you use it? - regardless of
whether anyone you know/work with has read Kaufman & Rousseeuw. daisy()
conforms most closely to the L&L description in publicly available R
code, to the best of my knowledge; more so than the one in vegan (which
I suspect was never intended to be used in the mixed variable
situation).

L&L's description also includes the use of Kronecker's Deltas, which are
used to weight the association towards one or more variables - this is
not implemented in daisy(). I have a version of Gower's distance that
implements these weights in R, part of a package to be released to CRAN
in the very near future. This function was based entirely on the L&L
description.

> 
> It would be quite useful to do a rewrite of Sneath and Sokal's
> Numerical Taxonomy (1973), substituting the 'R best-practice' approach
> for their suggested methods. Does such a thing exist already? Maybe
> that could be a postdoc project!

Having a copy of S&S on my desk, I checked and the description included
on pages 135-6 of the (1973) edition is the same as that in L&L and the
one implemented in daisy().

Cheers

G

> 
> Cheers,
> 
> Tyler

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson                 [t] +44 (0)20 7679 0522
 ECRC & ENSIS, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list