[R] Very Slow Gower Similarity Function

Tyler Smith tyler.smith at mail.mcgill.ca
Mon Apr 18 18:10:34 CEST 2005


Hello,

I am a relatively new user of R. I have written a basic function to calculate
the Gower similarity function. I was motivated to do so partly as an excercise
in learning R, and partly because the existing option (vegdist in the vegan
package) does not accept missing values.

I think I have succeeded - my function gives me the correct values. However, now
that I'm starting to use it with real data, I realise it's very slow. It takes
more than 45 minutes on my Windows 98 machine (R 2.0.1 Patched (2005-03-29))
with a 185x32 matrix with ca 100 missing values. If anyone can suggest ways to
speed up my function I would appreciate it. I suspect having a pair of nested
for loops is the problem, but I couldn't figure out how to get rid of them.

The function is:

### Gower Similarity Matrix###

sGow <- function (mat){

OBJ <- nrow(mat) #number of objects
MATDESC <- ncol (mat) #number of descriptors
MRANGE <- apply (mat,2,max, na.rm=T)-apply (mat,2,min,na.rm=T) #descr ranges
DESCRIPT <- 1:MATDESC #descriptor index vector
smat <- matrix(1, nrow = OBJ, ncol = OBJ) #'empty' similarity matrix

for (i in 1:OBJ){
  for (j in i:OBJ){

    ##calculate index vector of non-NA descriptors between objects i and j
    descvect <- intersect (setdiff (DESCRIPT, DESCRIPT[is.na(mat[i,DESCRIPT])]),
     setdiff (DESCRIPT, DESCRIPT[is.na (mat[j,DESCRIPT])]))

    descnum <- length(descvect) # number of valid descr for i~j comparison

    partialsim <- (1- abs(mat[i,descvect]-mat[j,descvect])/MRANGE[descvect])

    smat[i,j] <- smat[j,i] <- sum (partialsim) / descnum
  }
}
smat
}

Thank-you for your time,

Tyler

-- 
Tyler Smith

PhD Candidate
Plant Science Department
McGill University

tyler.smith at mail.mcgill.ca




More information about the R-help mailing list