[R] "Best" way to merge 300+ .5MB dataframes?

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Mon Aug 11 01:07:48 CEST 2014


Err... sorry... you have to use do.call with base rbind as David illustrates. I am spoiled by rbind.fill from the plyr package. rbind.fill accepts the list directly and also fills in any missing columns with NA, which avoids having to dig through all the files to find any oddballs.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On August 10, 2014 2:22:06 PM PDT, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
>Just load the data frames into a list and give that list to rbind. It
>is way more efficient to be able to identify how big the final data
>frame is going to have to be at the beginning and preallocate the
>result memory than to incrementally allocate larger and larger data
>frames along the way using Reduce.
>---------------------------------------------------------------------------
>Jeff Newmiller                        The     .....       .....  Go
>Live...
>DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>Go...
>                                     Live:   OO#.. Dead: OO#..  Playing
>Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>/Software/Embedded Controllers)               .OO#.       .OO#. 
>rocks...1k
>---------------------------------------------------------------------------
>
>Sent from my phone. Please excuse my brevity.
>
>On August 10, 2014 11:51:22 AM PDT, Grant Rettke
><gcr at wisdomandwonder.com> wrote:
>>Good afternoon,
>>
>>Today I was working on a practice problem. It was simple, and perhaps
>>even realistic. It looked like this:
>>
>>• Get a list of all the data files in a directory
>>• Load each file into a dataframe
>>• Merge them into a single data frame
>>
>>Because all of the columns were the same, the simplest solution in my
>>mind was to `Reduce' the vector of dataframes with a call to
>>`merge'. That worked fine, I got what was expected. That is key
>>actually. It is literally a one-liner, and there will never be index
>>or scoping errors with it.
>>
>>Now with that in mind, what is the idiomatic way? Do people usually do
>>something else because it is /faster/ (by some definition)?
>>
>>Kind regards,
>>
>>Grant Rettke | ACM, ASA, FSF, IEEE, SIAM
>>gcr at wisdomandwonder.com | http://www.wisdomandwonder.com/
>>“Wisdom begins in wonder.” --Socrates
>>((λ (x) (x x)) (λ (x) (x x)))
>>“Life has become immeasurably better since I have been forced to stop
>>taking it seriously.” --Thompson
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
>>http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list