[R] Dynamic Dictionary Data Type?

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jun 2 20:40:35 CEST 2005


On Thu, 2 Jun 2005, Duncan Murdoch wrote:

> Gabor Grothendieck wrote:
>> On 6/2/05, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
>> 
>>> On Thu, 2 Jun 2005, hadley wickham wrote:
>>> 
>>> 
>>>>> An environment is a hash table, and copying occurs only on modification,
>>>>> when any language would have to copy in this context.
>>> 
>>> Caveat: the default is new.env(hash=FALSE), so an environment is a hash
>>> table in one sense but not necessarily so in the strict sense.
>> 
>> 
>> Can you expand on this?  When would one use hash = TRUE vs.
>> the default?
>
> It's an optimization question.  hash = TRUE uses more memory and takes longer 
> to set up, but will give faster searches in big environments. The current 
> default assumes that environments are small so the effort of building the 
> hash table is not worthwhile, and it does linear searches.

It's not really size: building small hash tables is quick.  The issue is 
more to do with whether there are many lookups done compared to entries.

We met the same issues for a named vector a while back.  The relevant NEWS 
item was

     o	Indexing a vector by a character vector was slow if both the
 	vector and index were long (say 10,000).  Now hashing is used
 	and the time should be linear in the longer of the lengths
 	(but more memory is used).


> I suspect that we might have the default wrong (or perhaps should make the 
> transition from linear to hash search automatically at a certain threshold 
> size), but I haven't done the testing necessary to show this.

Here's an example

tr <- as.character(trunc(runif(1e5, max=100)))
system.time({
   env <- new.env(hash=F)
   for(i in 1:1e5) assign(tr[i], i, envir=env)
}, gcFirst=TRUE)

which takes about 5% less with hashing.  Now change to max=1e4: the hashed 
version takes about 50% longer, the unhashed one 120x.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list