[Rd] O(?) access time for rownames/colnames/named dimensions

mfrumin michael at frumin.net
Fri Aug 3 13:06:14 CEST 2007


Dear princely R developers to whom I owe so much of my daily productivity
level,

I am wondering how indexing by row and column names are implemented.  That
is, when I have a matrix with named rows (or columns) and I then index into
that matrix using the string names (rather than integers), how does R scan
through the list of names in order to figure out which row/column to use? 
My current suspicion is that it's a linear search, but I've been wrong many
times before.

If it is linear, may I ask if the possibility of implementing the
name-to-integer-index lookup in a more efficient way as ever been discussed? 
For example, why not hash the string indexes to quickly map them to integer
indexes?

Of course adding this sort of functionality would make it somewhat more
expensive to add rows/columns anywhere but the end of the matrix, but in
many cases I suspect it would be worth it.  Or perhaps there would be a way
to enable this selectively on each named dimension?

In my use case, I am building an array with named dimensions once, and then
doing lots of reading/updates into that matrix without changing its shape. 
This is proving to be rather costly at the moment and I've taken to using a
hashed environment to do lookups where it really counts, but this is rather
inconvenient.

I'm sure this has been thought about before, but I didn't see anything in
this mailing list.

Thanks,
Michael
-- 
View this message in context: http://www.nabble.com/O%28-%29-access-time-for-rownames-colnames-named-dimensions-tf4211918.html#a11981293
Sent from the R devel mailing list archive at Nabble.com.



More information about the R-devel mailing list