[R] How to speed up list access in R?

Thu Oct 30 18:02:43 CET 2014

Thanks to all for the help everyone! For the moment I'll stick with 
Bill's solution, but I'll check out the other recommendations as well.

Regarding the issue of slow looks ups for lists, are there any hash map 
implementations in R that are faster? I like using fairly simple logic 
and data structures when prototyping and then only optimize code when 
and where it's necessary which is why I'm curious about these basic objects.

On another note, is there a vector style implementation that changes the 
vectors in place? If I'm not mistaken, the append operation creates and 
returns a new vector each time which is line with the functional nature 
of R. If there were some way to have it mutable, it could be much 
faster. This is fairly standard in many languages. Behind the scenes 
memory is allocated at say 2 times the current size so that you only 
need log(n) extensions when building up a vector like this. Are there 
any such equivalents in R? I presume that lists are mutable (am I 
wrong?), but they seem to have the lookup slowdown problem.

Again thanks a lot!

Cheers,
Thomas

On 10/30/2014 12:05 PM, William Dunlap wrote:
> Repeatedly extending vectors takes a lot of time.  You can do what you want with
>    d2 <- split(values, factor(numbers, levels=unique(numbers)))
> If you would like the labels on d2 to be in numeric order then you can
> simplify that to
>    d3 <- split(values, numbers)
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg <tomnyberg at gmail.com> wrote:
>> Hello,
>>
>> I want to do the following: Given a set of (number, value) pairs, I want to
>> create a list l so that l[[toString(number)]] returns the vector of values
>> associated to that number. It is hundreds of times slower than the
>> equivalent that I would write in python. I'm pretty new to R so I bet I'm
>> using its data structures inefficiently, but I've tried more or less
>> everything I can think of and can't really speed it up. I have done some
>> profiling which helped me find problem areas, but I couldn't speed things up
>> even with that information. I'm thinking I'm just fundamentally using R in a
>> silly way.
>>
>> I've included code for the different versions. I wrote the python code in a
>> style to make it as clear to R programmers as possible. Thanks a lot! Any
>> help would be greatly appreciated!
>>
>> Cheers,
>> Thomas
>>
>>
>> R code (with two versions depending on commenting):
>>
>> -----
>>
>> numbers <- numeric(0)
>> for (i in 1:5) {
>>      numbers <- c(numbers, sample(1:30000, 10000))
>> }
>>
>> values <- numeric(0)
>> for (i in 1:length(numbers)) {
>>      values <- append(values, sample(1:10, 1))
>> }
>>
>>             starttime <- Sys.time()
>>
>> d = list()
>> for (i in 1:length(numbers)) {
>>      number = toString(numbers[i])
>>      value = values[i]
>>      if (is.null(d[[number]])) {
>>      #if (number %in% names(d)) {
>>          d[[number]] <- c(value)
>>      } else {
>>          d[[number]] <- append(d[[number]], value)
>>      }
>> }
>>
>> endtime <- Sys.time()
>>
>> print(format(endtime - starttime))
>>
>> -----
>>
>> uncommented version: "45.64791 secs"
>> commented version: "1.423056 mins"
>>
>>
>>
>> Another version of R code:
>>
>> -----
>>
>> numbers <- numeric(0)
>> for (i in 1:5) {
>>      numbers <- c(numbers, sample(1:30000, 10000))
>> }
>>
>> values <- numeric(0)
>> for (i in 1:length(numbers)) {
>>      values <- append(values, sample(1:10, 1))
>> }
>>
>> starttime <- Sys.time()
>>
>> d = list()
>> for (number in unique(numbers)) {
>>      d[[toString(number)]] <- numeric(0)
>> }
>> for (i in 1:length(numbers)) {
>>      number = toString(numbers[i])
>>      value = values[i]
>>      d[[number]] <- append(d[[number]], value)
>> }
>>
>> endtime <- Sys.time()
>>
>> print(format(endtime - starttime))
>>
>> -----
>>
>> "47.15579 secs"
>>
>>
>>
>> The python code:
>>
>> -----
>>
>> import random
>> import time
>>
>> numbers = []
>> for i in range(5):
>>      numbers += random.sample(range(30000), 10000)
>>
>> values = []
>> for i in range(len(numbers)):
>>      values.append(random.randint(1, 10))
>>
>> starttime = time.time()
>>
>> d = {}
>> for i in range(len(numbers)):
>>      number = numbers[i]
>>      value = values[i]
>>      if d.has_key(number):
>>          d[number].append(value)
>>      else:
>>          d[number] = [value]
>>
>> endtime = time.time()
>>
>> print endtime - starttime, "seconds"
>>
>> -----
>>
>> 0.123021125793 seconds
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.