[R] Extract values from multiple lists
SH
emptican at gmail.com
Wed Dec 17 22:41:06 CET 2014
Dear Dennis, David, Jeff, and Denes,
Thanks for your helps and comments. The simple one seems good enough for
my works.
Best,
Steve
On Wed, Dec 17, 2014 at 5:46 AM, Dénes Tóth <toth.denes at ttk.mta.hu> wrote:
>
> Dear Jeff,
>
> On 12/17/2014 01:46 AM, Jeff Newmiller wrote:
>
>> You are chasing ghosts of performance past, Denes.
>>
>
> In terms of memory efficiency, yes. In terms of CPU time, there can be
> significant difference, see below.
>
>
> The data.frame
>
>> function causes no problems, and if it is used then the OP would not
>> need to presume they know the internal structure of the data frame.
>> See below. (I am using R3.1.2.)
>>
>> a1 <- list(x = rnorm(1e6), y = rnorm(1e6))
>> a2 <- list(x = rnorm(1e6), y = rnorm(1e6))
>> a3 <- list(x = rnorm(1e6), y = rnorm(1e6))
>>
>> # get names of the objects
>> out_names <- ls(pattern="a[[:digit:]]$")
>>
>> # amount of memory allocated
>> gc(reset=TRUE)
>>
>> # Explicitly call data frame
>> out2 <- data.frame( a1=a1[["x"]], a2=a2[["x"]], a3=a3[["x"]] )
>>
>> # No copying.
>> gc()
>>
>> # Your suggested retreival method
>> out3a <- lapply( lapply( out_names, get ), "[[", "x" )
>> names( out3a ) <- out_names
>> # The "obvious" way to finish the job works fine.
>> out3 <- do.call( data.frame, out3a )
>>
>
> BTW, the even more "obvious" as.data.frame() produces the same with an
> even more intuitive interface.
>
> However, for lists with a larger number of elements the transformation to
> a data.frame can be pretty slow. In the toy example, we created only a
> three-element list. Let's increase it a little bit.
>
> ---
>
> # this is not even that large
> datlen <- 1e2
> listlen <- 1e5
>
> # create a toy list
> mylist <- matrix(seq_len(datlen * listlen),
> nrow = datlen, ncol = listlen)
> mylist <- lapply(1:ncol(mylist), function(i) mylist[, i])
> names(mylist) <- paste0("V", seq_len(listlen))
>
>
> # define the more efficient function ---
> # note that I put class(x) first so that setattr does not
> # modify the attributes of the original input (see ?setattr,
> # you have to be careful)
> setAttrib <- function(x) {
> class(x) <- "data.frame"
> data.table::setattr(x, "row.names", seq_along(x[[1]]))
> x
> }
>
> # benchmarking
> # (we do not need microbenchmark here, the differences are
> # extremely large) - on my machine, 9.4 sec, 8.1 sec vs 0.15 sec
> gc(reset=TRUE)
> system.time(df1 <- do.call(data.frame, mylist))
> gc()
> system.time(df2 <- as.data.frame(mylist))
> gc()
> system.time(df3 <- setAttrib(mylist))
> gc()
>
> # check results
> identical(df1, df2)
> identical(df1, df3)
>
> ----
>
> Of course for small datasets, one should use the built-in and safe
> functions (either do.call or as.data.frame). BTW, for the original
> three-element list, these are even faster than the workaround.
>
> All the best,
> Denes
>
>
>
>
>
>
>> # No copying... well, you do end up with a new list in out3, but the
>> data itself doesn't get copied.
>> gc()
>>
>>
>> On Tue, 16 Dec 2014, D?nes T?th wrote:
>>
>> On 12/16/2014 06:06 PM, SH wrote:
>>>
>>>> Dear List,
>>>>
>>>> I hope this posting is not redundant. I have several list outputs
>>>> with the
>>>> same components. I ran a function with three different scenarios below
>>>> (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the
>>>> same components and group them as a data frame. For example,
>>>> pop.inf.r1 <- scen1[['pop.inf.r']]
>>>> pop.inf.r2 <- scen2[['pop.inf.r']]
>>>> pop.inf.r3 <- scen3[['pop.inf.r']]
>>>> ...
>>>> pop.inf.rN<-scenN[['pop.inf.r']]
>>>> new.df <- data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN)
>>>>
>>>> My final output would be 'new.df'. Could you help me how I can do that
>>>> efficiently?
>>>>
>>>
>>> If efficiency is of concern, do not use data.frame() but create a list
>>> and add the required attributes with data.table::setattr (the setattr
>>> function of the data.table package). (You can also consider creating a
>>> data.table instead of a data.frame.)
>>>
>>> # some largish lists
>>> a1 <- list(x = rnorm(1e6), y = rnorm(1e6))
>>> a2 <- list(x = rnorm(1e6), y = rnorm(1e6))
>>> a3 <- list(x = rnorm(1e6), y = rnorm(1e6))
>>>
>>> # amount of memory allocated
>>> gc(reset=TRUE)
>>>
>>> # get names of the objects
>>> out_names <- ls(pattern="a[[:digit:]]$")
>>>
>>> # create a list
>>> out <- lapply(lapply(out_names, get), "[[", "x")
>>>
>>> # note that no copying occured
>>> gc()
>>>
>>> # decorate the list
>>> data.table::setattr(out, "names", out_names)
>>> data.table::setattr(out, "row.names", seq_along(out[[1]]))
>>> class(out) <- "data.frame"
>>>
>>> # still no copy
>>> gc()
>>>
>>> # output
>>> head(out)
>>>
>>>
>>> HTH,
>>> Denes
>>>
>>>
>>>
>>>> Thanks in advance,
>>>>
>>>> Steve
>>>>
>>>> P.S.: Below are some examples of summary outputs.
>>>>
>>>>
>>>> summary(scen1)
>>>>>
>>>> Length Class Mode
>>>> aql 1 -none- numeric
>>>> rql 1 -none- numeric
>>>> alpha 1 -none- numeric
>>>> beta 1 -none- numeric
>>>> n.sim 1 -none- numeric
>>>> N 1 -none- numeric
>>>> n.sample 1 -none- numeric
>>>> n.acc 1 -none- numeric
>>>> lot.inf.r 1 -none- numeric
>>>> pop.inf.n 2000 -none- list
>>>> pop.inf.r 2000 -none- list
>>>> pop.decision.t1 2000 -none- list
>>>> pop.decision.t2 2000 -none- list
>>>> sp.inf.n 2000 -none- list
>>>> sp.inf.r 2000 -none- list
>>>> sp.decision 2000 -none- list
>>>>
>>>>> summary(scen2)
>>>>>
>>>> Length Class Mode
>>>> aql 1 -none- numeric
>>>> rql 1 -none- numeric
>>>> alpha 1 -none- numeric
>>>> beta 1 -none- numeric
>>>> n.sim 1 -none- numeric
>>>> N 1 -none- numeric
>>>> n.sample 1 -none- numeric
>>>> n.acc 1 -none- numeric
>>>> lot.inf.r 1 -none- numeric
>>>> pop.inf.n 2000 -none- list
>>>> pop.inf.r 2000 -none- list
>>>> pop.decision.t1 2000 -none- list
>>>> pop.decision.t2 2000 -none- list
>>>> sp.inf.n 2000 -none- list
>>>> sp.inf.r 2000 -none- list
>>>> sp.decision 2000 -none- list
>>>>
>>>>> summary(scen3)
>>>>>
>>>> Length Class Mode
>>>> aql 1 -none- numeric
>>>> rql 1 -none- numeric
>>>> alpha 1 -none- numeric
>>>> beta 1 -none- numeric
>>>> n.sim 1 -none- numeric
>>>> N 1 -none- numeric
>>>> n.sample 1 -none- numeric
>>>> n.acc 1 -none- numeric
>>>> lot.inf.r 1 -none- numeric
>>>> pop.inf.n 2000 -none- list
>>>> pop.inf.r 2000 -none- list
>>>> pop.decision.t1 2000 -none- list
>>>> pop.decision.t2 2000 -none- list
>>>> sp.inf.n 2000 -none- list
>>>> sp.inf.r 2000 -none- list
>>>> sp.decision 2000 -none- list
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> ------------------------------------------------------------
>> ---------------
>> Jeff Newmiller The ..... ..... Go
>> Live...
>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
>> Go...
>> Live: OO#.. Dead: OO#.. Playing
>> Research Engineer (Solar/Batteries O.O#. #.O#. with
>> /Software/Embedded Controllers) .OO#. .OO#.
>> rocks...1k
>> ------------------------------------------------------------
>> ---------------
>>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list