[R] vectorization instead of using loop

Richard.Cotton at hsl.gov.uk Richard.Cotton at hsl.gov.uk
Thu Oct 9 18:11:06 CEST 2008


> I've sent this question 2 days ago and got response from Sarah. Thanks 
for
> that. But unfortunately, it did not really solve our problem. The main 
issue
> is that we want to use our own (manipulated) covariance matrix in the
> calculation of the mahalanobis distance. Does anyone know how to 
vectorize
> the below code instead of using a loop (which slows it down)?
> I'd really appreciate any help on this, thank you all in advance!
> Cheers,
> Frank
> 
> This is what I posted 2 days ago:
> We have a data frame x with n people as rows and k variables as columns.
> Now, for each person (i.e., each row) we want to calculate a distance
> between  him/her and EACH other person in x. In other words, we want to
> create a n x n matrix with distances (with zeros in the diagonal).
> However, we do not want to calculate Euclidian distances. We want to
> calculate Mahalanobis distances, which take into account the covariance
> among variables.
> Below is the piece of code we wrote ("covmat" in the function below is 
the
> variance-covariance matrix among variables in Data that has to be fed 
into
> mahalonobis function we are using).
>  mahadist = function(x, covmat) {
>  dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))
>  for (i in 1:nrow(x)) {
>        dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), 
covmat)^.5
>  }
>  return(dismat)
> }
> 
> This piece of code works, but it is very slow. We were wondering if it's 
at
> all possible to somehow vectorize this function. Any help would be 
greatly
> appreciated.

You can save a substantial time by calling as.matrix before the loop, e.g.

x <- data.frame(runif(1000), runif(1000), runif(1000))
covmat <- cov(x)

mahadist = function(x, covmat) #yours
{
   dismat = matrix(0,ncol=nrow(x),nrow=nrow(x))
   for (i in 1:nrow(x)) 
   {
         dismat[i,] = mahalanobis(as.matrix(x), as.matrix(x[i,]), 
covmat)^.5
   }
   return(dismat)
}

mahadist2 <- function(x, covmat) #my modification
{
   n <- nrow(x)
   dismat <- matrix(0,ncol=n,nrow=n)
   matx <- as.matrix(x)
   for (i in 1:n) 
   {
      dismat[i,] <- mahalanobis(matx, matx[i,], covmat)^.5
   }
   dismat
}
system.time(mahadist(x, covmat))
#   user  system elapsed 
#   2.82    0.06    2.95 
system.time(mahadist2(x, covmat))
#   user  system elapsed 
#   1.39    0.04    1.45

Regards,
Richie.

Mathematical Sciences Unit
HSL


------------------------------------------------------------------------
ATTENTION:

This message contains privileged and confidential inform...{{dropped:20}}



More information about the R-help mailing list