[R] fast subsetting of lists in lists

Alexander Senger senger at physik.hu-berlin.de
Tue Dec 7 18:47:26 CET 2010


I tried to hide the gory details as the structure of my datasets is
rather complicated. Basically its a long list of lists which in turn
contain character vectors, dates, numerics and dataframes, all named.
While the hierarchy is fixed neither the number of elements nor their
ordering is. But if I try to access a certain element, then I know it is
there and contains sensible data.
For a typical day of measurements the whole package weights around 1
GiB. How often and what I need to extract varies as the analyses is
rather dynamic.

As far as I can see a thorough refactoring of the datasets so that
everything is contained in one large dataframe might be a solution. But
I wouldn't be too unhappy if I could avoid this rather tedious work.

Alex


Am 07.12. 18:26, schrieb William Dunlap:
> To find the fastest method you need to tell more
> about the constraints on your problem.
>    Do you always have a list of lists of scalars
>       or are the lists buried at various depths
>       or do the numeric vectors at the leaves have
>       various lengths?
>    If you always have a list of lists of scalars,
>       do the names always come in the same order?
>       (It may be faster to select by numeric position
>       than by name).
>    Do all the lists of numeric vectors contain an
>       element by the given name?
>    What is a typical size for the problem?  How
>       many times do you typically need to repeat
>       the solution?
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com  
> 
>> -----Original Message-----
>> From: r-help-bounces at r-project.org 
>> [mailto:r-help-bounces at r-project.org] On Behalf Of Alexander Senger
>> Sent: Tuesday, December 07, 2010 9:12 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] fast subsetting of lists in lists
>>
>> Hello Gerrit, Gabor,
>>
>>
>> thank you for your suggestion.
>>
>> Unfortunately unlist seems to be rather expensive. A short 
>> test with one
>> of my datasets gives 0.01s for an extraction based on my approach and
>> 5.6s for unlist alone. The reason seems to be that unlist relies on
>> lapply internally and does so recursively?
>>
>> Maybe there is still another way to go?
>>
>> Alex
>>
>> Am 07.12.2010 15:59, schrieb Gerrit Eichner:
>>> Hello, Alexander,
>>>
>>> does
>>>
>>> utest <- unlist(test)
>>> utest[ names( utest) == "a"]
>>>
>>> come close to what you need?
>>>
>>> Hth,
>>>
>>> Gerrit
>>>
>>>
>>> On Tue, 7 Dec 2010, Alexander Senger wrote:
>>>
>>>> Hello,
>>>>
>>>>
>>>> my data is contained in nested lists (which seems not 
>> necessarily to be
>>>> the best approach). What I need is a fast way to get 
>> subsets from the
>>>> data.
>>>>
>>>> An example:
>>>>
>>>> test <- list(list(a = 1, b = 2, c = 3), list(a = 4, b = 5, c = 6),
>>>> list(a = 7, b = 8, c = 9))
>>>>
>>>> Now I would like to have all values in the named variables 
>> "a", that is
>>>> the vector c(1, 4, 7). The best I could come up with is:
>>>>
>>>> val <- sapply(1:3, function (i) {test[[i]]$a})
>>>>
>>>> which is unfortunately not very fast. According to 
>> R-inferno this is due
>>>> to the fact that apply and its derivates do looping in R 
>> rather than
>>>> rely on C-subroutines as the common [-operator.
>>>>
>>>> Does someone now a trick to do the same as above with the faster
>>>> built-in subsetting? Something like:
>>>>
>>>> test[<somesubsettingmagic>]
>>>>
>>>>
>>>> Thank you for your advice
>>>>
>>>>
>>>> Alex
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>



More information about the R-help mailing list