[R] fast way to compare two matrices of combinations

Mark W Kimpel mwkimpel at gmail.com
Thu Mar 13 17:23:57 CET 2008


I have a list (length 750), each element containing a vector of unique 
strings (unique gene ids), with length up to ~40 (median 15). I want to 
compile a matrix of all possible triplets and their frequency within 
gene elements. Using combn and a lot of looping, I am accomplishing this 
but it is VERY slow.

I've tried to figure out a way to vectorize this, using "match" and 
"%in%", but can't get my mind around it.

Below is my code. sig.tf.pairs is the list. Suggestions?

Mark


############################################################
M <- 3 # 3 for triplets, etc.
##########################################################
# count all triplets
all.triplets <- NULL
all.count.vec <- NULL
for (i in 1:length(sig.tf.pairs)){
   if (length(sig.tf.pairs[[i]] >= M)){
     triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
     for (j in 1:ncol(triplets)){
       o <- order(triplets[,j])
       triplets[,j] <- triplets[o,j]
       count.vec <- rep(1, ncol(triplets))
     }
     if (is.null(all.count.vec)){
       all.count.vec <- count.vec
       all.triplets <- triplets
     } else {
       redundant.vec <- NULL
       for (k in 1:ncol(all.triplets)){
         for (m in 1:ncol(triplets)){
           if (length(intersect(triplets[,m], all.triplets[,k] == M))){
             all.count.vec[k] <- all.count.vec[k] + 1
             redundant.vec <- c(redundant.vec, m)
           }
         }
       }
       if(!is.null(redundant.vec)){
         triplets <- triplets[,-redundant.vec]
         count.vec <- count.vec[,-redundant.vec]
       }
       all.triplets <- cbind(all.triplets, triplets)
       all.count.vec <- c(all.count.vec, count.vec)
     }
   }
}
###################################

-- 

Mark W. Kimpel MD  ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine

15032 Hunter Court, Westfield, IN  46074

(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)

mwkimpel<at>gmail<dot>com



More information about the R-help mailing list