[R] How to speed up list access in R?

Thomas Nyberg tomnyberg at gmail.com
Thu Oct 30 16:17:59 CET 2014


Hello,

I want to do the following: Given a set of (number, value) pairs, I want 
to create a list l so that l[[toString(number)]] returns the vector of 
values associated to that number. It is hundreds of times slower than 
the equivalent that I would write in python. I'm pretty new to R so I 
bet I'm using its data structures inefficiently, but I've tried more or 
less everything I can think of and can't really speed it up. I have done 
some profiling which helped me find problem areas, but I couldn't speed 
things up even with that information. I'm thinking I'm just 
fundamentally using R in a silly way.

I've included code for the different versions. I wrote the python code 
in a style to make it as clear to R programmers as possible. Thanks a 
lot! Any help would be greatly appreciated!

Cheers,
Thomas


R code (with two versions depending on commenting):

-----

numbers <- numeric(0)
for (i in 1:5) {
     numbers <- c(numbers, sample(1:30000, 10000))
}

values <- numeric(0)
for (i in 1:length(numbers)) {
     values <- append(values, sample(1:10, 1))
}

            starttime <- Sys.time()

d = list()
for (i in 1:length(numbers)) {
     number = toString(numbers[i])
     value = values[i]
     if (is.null(d[[number]])) {
     #if (number %in% names(d)) {
         d[[number]] <- c(value)
     } else {
         d[[number]] <- append(d[[number]], value)
     }
}

endtime <- Sys.time()

print(format(endtime - starttime))

-----

uncommented version: "45.64791 secs"
commented version: "1.423056 mins"



Another version of R code:

-----

numbers <- numeric(0)
for (i in 1:5) {
     numbers <- c(numbers, sample(1:30000, 10000))
}

values <- numeric(0)
for (i in 1:length(numbers)) {
     values <- append(values, sample(1:10, 1))
}

starttime <- Sys.time()

d = list()
for (number in unique(numbers)) {
     d[[toString(number)]] <- numeric(0)
}
for (i in 1:length(numbers)) {
     number = toString(numbers[i])
     value = values[i]
     d[[number]] <- append(d[[number]], value)
}

endtime <- Sys.time()

print(format(endtime - starttime))

-----

"47.15579 secs"



The python code:

-----

import random
import time

numbers = []
for i in range(5):
     numbers += random.sample(range(30000), 10000)

values = []
for i in range(len(numbers)):
     values.append(random.randint(1, 10))

starttime = time.time()

d = {}
for i in range(len(numbers)):
     number = numbers[i]
     value = values[i]
     if d.has_key(number):
         d[number].append(value)
     else:
         d[number] = [value]

endtime = time.time()

print endtime - starttime, "seconds"

-----

0.123021125793 seconds



More information about the R-help mailing list