[R] Lookups in R

Michael Frumin michael at frumin.net
Wed Jul 4 23:40:19 CEST 2007


i wish it were that simple.  unfortunately the logic i have to do on 
each transaction is substantially more complicated, and involves 
referencing the existing values of the user table through a number of 
conditions.

any other thoughts on how to get better-than-linear performance time?  
is there a recommended binary searching/sorting (i.e. BTree) module that 
I could use to maintain my own index?

thanks,
mike

Peter Dalgaard wrote:
> mfrumin wrote:
>> Hey all; I'm a beginner++ user of R, trying to use it to do some 
>> processing
>> of data sets of over 1M rows, and running into a snafu.  imagine that my
>> input is a huge table of transactions, each linked to a specif user 
>> id.  as
>> I run through the transactions, I need to update a separate table for 
>> the
>> users, but I am finding that the traditional ways of doing a table 
>> lookup
>> are way too slow to support this kind of operation.
>>
>> i.e:
>>
>> for(i in 1:1000000) {
>>    userid = transactions$userid[i];
>>    amt = transactions$amounts[i];
>>    users[users$id == userid,'amt'] += amt;
>> }
>>
>> I assume this is a linear lookup through the users table (in which 
>> there are
>> 10's of thousands of rows), when really what I need is O(constant 
>> time), or
>> at worst O(log(# users)).
>>
>> is there any way to manage a list of ID's (be they numeric, string, 
>> etc) and
>> have them efficiently mapped to some other table index?
>>
>> I see the CRAN package for SQLite hashes, but that seems to be going 
>> a bit
>> too far.
>>   
> Sometimes you need a bit of lateral thinking. I suspect that you could 
> do it like this:
>
> tbl <- with(transactions, tapply(amount, userid, sum))
> users$amt <- users$amt + tbl[users$id]
>
> one catch is that there could be users with no transactions, in which 
> case you may need to replace userid by factor(userid, 
> levels=users$id). None of this is tested, of course.



More information about the R-help mailing list