[R] Dataframe from list of similar lists: not _a_ way, but _the best_ way

Brian Diggs diggsb at ohsu.edu
Tue Dec 7 20:18:25 CET 2010


On 12/7/2010 1:03 AM, Nick Sabbe wrote:
> Hi All.
>
> I often find myself in this situation:
>
> .         Based on some vector (or list) of values, I need to calculate a
> few new values for each of them, where some of the new values are numbers,
> but some are more of descriptive nature (so: character strings)
>
> .         So I use e.g. sapply, passing a custom function that returns a
> list with all the calculated values
>
> .         The result of this is: a list (=the return value of sapply) of
> lists, that all have the same kind of named values
>
> A silly example:
>
> list.of.lists<-sapply(1:10, function(nr){list(org=nr,
> chr=as.character(nr))})

Actually, this is not a list of lists, but rather a list of vectors with 
dimensions.  I didn't know such a thing existed, but obviously it does.

> It seems rather obvious that the result would be better structured as a
> dataframe.
>
> Now I know a few ways to do this (using do.call), but I fear most of these
> are rather bad in performance: I suspect all the data is being repetitively
> copied which may be slow.
>
> So, my question to the specialists:
>
> .         Is the above way of working reasonable for this kind of problem?
> Or would you suggest otherwise?
>
> .         What would be the best (as in: quickest) way of transforming this
> list of lists to a dataframe? The answer to this is probably based upon
> knowledge of the inner workings of R? Or is there any way in which this
> depends on the specifics of my function (for nontrivial functions and list
> sizes)?

I don't know that this is best (in terms of fastest and/or least memory 
usage), but to me the following is "best" in that it hands off the 
problem to a package that is designed to handle such problems, so 
presumably does a better job than any one-off approach.

library("plyr")

DF <- ldply(1:10, function(nr){data.frame(org=nr, chr=as.character(nr))})

Note that the internal function returns a data.frame rather than a list, 
and the *dply functions automatically stitch the individual data.frames 
together.  Check out the documentation to the plyr package.

> Thanks!
>
> Nick Sabbe
>
> --
> ping: nick.sabbe at ugent.be
> link:<http://biomath.ugent.be/>  http://biomath.ugent.be
> wink: A1.056, Coupure Links 653, 9000 Gent
> ring: 09/264.59.36
> -- Do Not Disapprove

-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



More information about the R-help mailing list