[R] gdist and gower distance

Jari Oksanen jarioksa at sun3.oulu.fi
Tue Nov 9 14:33:33 CET 2004


On Tue, 2004-11-09 at 12:59, Alessio Boattini wrote:
> Dear All,
>  
> I would like to ask clarifications on the gower distnce matrix calculated by the function gdistin the library mvpart.
> Here is a dummy example:
>  
> > library(mvpart)
> Loading required package: survival 
> Loading required package: splines 
>  mvpart package loaded: extends rpart to include
>  multivariate and distance-based partitioning
> > x=matrix(1:6, byrow=T, ncol=2)
> > x
>      [,1] [,2]
> [1,]    1    2
> [2,]    3    4
> [3,]    5    6
> > gdist(x, method="euclid")
>          1        2
> 2 2.828427         
> 3 5.656854 2.828427
>  
> ##########################
> doing the calculations by hand according to the formula in gdist help page I get the same results. The formula given is:
>  'euclidean'   d[jk] = sqrt(sum (x[ij]-x[ik])^2)
> #################################
> 
> > sqrt(8)
> [1] 2.828427
> > gdist(x, method="gower")
>           1         2
> 2 0.7071068          
> 3 1.4142136 0.7071068
>  
> #######################################
> doing the calculations by hand according to the formula in gdist help page cannot reproduce the same results. The formula given is:
> 'gower'       d[jk] = sum (abs(x[ij]-x[ik])/(max(i)-min(i))
> ##########################################
>  
> Could anybody please shed some light?
>  

There seems to be a bug in documentation. The function uses different
calculation than the help page specifies. Look at the 'gdist' code. Just
to make things easier: In the function body, gower is method 6, and
Euclidean distances are method 2.

Gower's original paper is available through http://www.jstor.org/
(Biometrics Vol. 27, No. 4, p. 857-871; 1971).

cheers, jari oksanen
-- 
Jari Oksanen <jarioksa at sun3.oulu.fi>




More information about the R-help mailing list