[R] optimize the filling of a diagonal matrix (two for loops)

Thomas Mailund mailund at birc.au.dk
Thu Aug 18 18:50:07 CEST 2016


 

The nested for-loops could very easily be moved to Rcpp which should speed them up. Using apply functions instead of for-loops will not make it faster; they still have to do the same looping.

At least, when I use `outer` to replace the loop I get roughly the same speed for the two versions — although the `outer` solution does iterate over the entire matrix and not just the upper-triangular matrix.

library(stringdist) # I don’t have TSmining library installed so I tested with this instead
for_loop_test <- function() {
  matrixPrepared <- matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS))
  for (i in 1:(nrow(dataS)-1)){
    for (j in (1+i):nrow(dataS)){
      matrixPrepared[i, j] <- stringdist(paste0(as.character(dataS[i,]), collapse=""),
                                         paste0(as.character(dataS[j,]), collapse=""))
    }
  }
  matrixPrepared
}

apply_test <- function() {
  get_dist <- function(i, j) {
    if (i <= j) NA
    else stringdist(paste0(as.character(dataS[i,]), collapse=""),
                    paste0(as.character(dataS[j,]), collapse=""))
  }
  get_dist <- Vectorize(get_dist)
  t(outer(1:nrow(dataS), 1:nrow(dataS), get_dist))
}

library(microbenchmark)
equivalent <- function(x, y) (is.na(x) && is.na(y)) || (x == y)
check <- function(values) all(equivalent(values[[1]], values[[2]]))
microbenchmark(for_loop_test(), apply_test(), check = check, times = 5)

Cheers
	Thomas


On 18 August 2016 at 17:41:01, AURORA GONZALEZ VIDAL (aurora.gonzalez2 at um.es(mailto:aurora.gonzalez2 at um.es)) wrote:

> Hello
>  
> I have two for loops that I am trying to optimize... I looked for
> vectorization or for using some funcions of the apply family but really
> cannot do it. I am writting my code with some small data set. With this
> size there is no problem but sometimes I will have hundreds of rows so it
> is really important to optimize the code. Any suggestion will be very
> welcomed.
>  
> library("TSMining")
> dataS = data.frame(V1 = sample(c(1,2,3,4),30,replace = T),
> V2 = sample(c(1,2,3,4),30,replace =
> T),
> V3 = sample(c(1,2,3,4),30,replace =
> T),
> V4 = sample(c(1,2,3,4),30,replace =
> T))
> saxM = Func.matrix(5)
> colnames(saxM) = 1:5
> rownames(saxM) = 1:5
> matrixPrepared = matrix(NA, nrow = nrow(dataS), ncol = nrow(dataS))
>  
> FOR(I IN 1:(NROW(DATAS)-1)){
> FOR(J IN (1+I):NROW(DATAS)){
> MATRIXPREPARED[I,J] = FUNC.DIST(AS.CHARACTER(DATAS[I,]),
> AS.CHARACTER(DATAS[J,]), SAXM, N=60)
> }
> }
> matrixPrepared
>  
> Thank you!
>  
>  
> ------
> Aurora González Vidal
> Phd student in Data Analytics for Energy Efficiency
>  
> Faculty of Computer Sciences
> University of Murcia
>  
> @. aurora.gonzalez2 at um.es
> T. 868 88 7866
> www.um.es/ae
>  
> [[alternative HTML version deleted]]
>  
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list