[Rd] Associative array?

Simon Urbanek simon.urbanek at r-project.org
Mon Mar 15 21:11:07 CET 2010


On Mar 15, 2010, at 15:25 , Roger Peng wrote:

> If I recall correctly, I thought indexing a vector/list with a
> character vector uses hashing if the vector is over a certain length
> (I can't remember the cutoff). Otherwise, it's a linear operation.

What I was talking about was lookup of one element in a named generic  
vector of length n so that is linear (~O(n)), because we were  
discussing l[[i]].

But, yes, you're right that if you use a vector for indexing x[y] (a  
slightly different topic but applicable to the data table example)  
then match() is used (=hashing) if either the subscript or names are  
long enough (subscript.c at 493 has the exact formula).


> On Thu, Mar 11, 2010 at 8:09 PM, Ben <misc7 at emerose.org> wrote:
>>> lists are generic vectors with names so lookup is O(n). Environments
>>> in R are true hash tables for that purpose:
>> Ahh, thanks for the information!  A function I wrote before indexing
>> on a data frame was slower than I expected, and now I know why.
>>> I don't quite understand - characters are (after raw vectors) the
>>> most expressive data type, so I'm not quite sure why that would be a
>>> limitation .. You can cast anything (but raw vector with nulls) into
>>> to a character.
>> It's no big problem, it's just that if the solution is to convert to
>> character type, then there are some implementation details to worry
>> about.  For instance, I assume that as.character(x) is a reversible
>> 1-1 mapping if x is an integer (and not NA or NULL, etc).  But
>> apparently that isn't exactly true for floats, and it would get more
>> complicated for other data types.  So that's why I said it would not
>> be elegant, but that is a very subjective statement.
>> On a deeper level, it seems counterintuitive to me that indexing in R
>> is O(n).  Futhermore, associative arrays are a fundamental data type,
>> so I think it's weird that I can read the R tutorial, the R language
>> definition, and even the manual page for new.env() and still not have
>> enough information to build a decent one.  So IMHO things would be
>> better if R had a built-in easy-to-use general purpose associative
>> array.
>>> I don't see a problem thus I'm not surprised it didn't come up
>>> ;). But maybe I'm just missing your point ...
>> Nope, this has come up before---I think R and I are just on different
>> wavelengths.  Various things that I think are a problem with R are
>> apparently not, and it's fine the way it is.
>> Anyway, sorry for getting off topic ;-) You posted everything I  
>> need to know and I really appreciate your help.
>> --
>> Ben Escoto
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> -- 
> Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

More information about the R-devel mailing list