[R] Fast lookup in ragged array

Peter McMahan peter.mcmahan at gmail.com
Fri Mar 16 16:42:06 CET 2007


Hello,

I'm running an algorithm for graph structural cohesion that requires  
a depth-first search of subgraphs of a rather large network. The  
algorithm will necessarily be redundant in the subgraphs it recurses  
to, so to speed up the process I implemented a check at each subgraph  
to see if it's been searched already.

This algorithm is very slow, and takes days to complete on a graph  
with about 200 nodes. It seems that a significant portion of the  
computation time is spent looking up the current subgraph in the list  
of searched subgraphs to see if it is redundant, and I'm wondering if  
there's a faster way to do this.

Currently, the list of searched subgraphs is a list  
(`theseSearchedBranches`) of unnamed numeric vectors. I'm checking  
against the list using the following command:

     if(list(v) %in% theseSearchedBranches){
         cat("    Branch already searched: passing.\n\n")
         return(NULL)
     }

v is a sorted numeric, with length anywhere between 3 and 200.  
Because theseSearchedBranches gets quite long as the algorithm  
progresses, the %in% command is taking maybe one or two seconds per  
lookup.

So to reiterate, I have a list() that looks something like this:

[[1]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]56 72 75 76 85 95 105 110 134 158 159 165 186

[[2]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]56 72 75 76 85 95 105 110 134 147 159 165 186

[[3]]
[1] 0 5 6 11 12 13 14 16 19 21 23 24 26 30 31 39 41
[18]50 56 72 75 85 95 105 110 134 147 158 159 165 186

...
and so on for tens of thousands of entries, and I am trying to find  
some sort of fast equivalent for %in% to search it. I'm also not  
adding the vectors to the list in any particular order, as I don't  
think %in% would know how to take advantage of that anyway.

Is there a data structure other than list() that I can use that would  
be faster? Would it be better to just create a hashed env and add  
empty variables named "0.5.6.11.12..."?
I know there are fast lookup algorithms out there that could take  
advantage of the fact that the items being searched are indiviually  
ordered numeric vectors, but I can't find any info about R  
implementations on the mailing lists or help. Is there any data type  
that implements a b-tree type of lookup scheme?

Any help would be greatly appreciated.

Thanks,
Peter



More information about the R-help mailing list