[R] Fast lookup in ragged array

Sat Mar 17 16:24:26 CET 2007

Peter McMahan <peter.mcmahan at gmail.com> writes:

> That's a good point. 

What's a good point?  [this is why top-posting isn't so helpful].

> What's the overhead on digests like that? 

Depends on the digest algorithm, the implementation, etc.  To some
extent, you can just try it and see.  Or you can compute the digest of
an average sized subgraph node label list in a loop and estimate that
way.

> Also, does that open up the possibility, exceedingly small though it
> may be, of misidentifying a branch as already searched and missing a
> qualifying subgraph?

Yes and the size of "exceedingly small" depends on the digest.  I
don't think this is worth worrying about.

>>> Also, is it better to over-estimate or under-estimate the
>>> size parameter?

I perhaps should have stressed that over-estimating is better.  The
way hashed environments work is that a vector is initialized to the
desired size and collisions are resolved using chaining.  To reduce
collisions and have a more efficient hashtable, you want to have more
slots in the vector than items since the hash function is rarely
perfect for your data.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org