[R] "Best" way to merge 300+ .5MB dataframes?

John McKown john.archie.mckown at gmail.com
Mon Aug 11 01:50:14 CEST 2014


On Sun, Aug 10, 2014 at 1:51 PM, Grant Rettke <gcr at wisdomandwonder.com> wrote:
>
> Good afternoon,
>
> Today I was working on a practice problem. It was simple, and perhaps
> even realistic. It looked like this:
>
> • Get a list of all the data files in a directory


OK, I assume this results in a vector of file names in a variable,
like you'd get from list.files();

>
> • Load each file into a dataframe


Why? Do you need them in separate data frames?

>
> • Merge them into a single data frame

The meat of the question. If you don't need the files in separate data
frames, and the files do _NOT_ have headers, then I would just load
them all into a single frame. I used Linux and so my solution may not
work on Windows. Something like:

list_of_files = list.files(pattern=".*data$"); # list of data files
#
# command to list contents of all files to stdout:
command <- pipe(paste('cat',list_of_files));
read.table(command,header=FALSE);

I would guess that Windows has something equivalent to cat, is it
"type"? I have a vague memory of that.

The above will work with header=TRUE, but the headers in the second
and subsequent files are taken as data. And if you have row.names in
the data, such as write.csv() does, then this is really not for you.
Well, at least it would not be as simple. There are ways around it
using a more intelligent "copy" program than "cat". Such as AWK. If
you need an AWK example, I can fake one up. It would strip the headers
from the 2nd and subsequent files and remove the first column
"row.names" values. Not really all that difficult, but "fiddly".

>
> Because all of the columns were the same, the simplest solution in my
> mind was to `Reduce' the vector of dataframes with a call to
> `merge'. That worked fine, I got what was expected. That is key
> actually. It is literally a one-liner, and there will never be index
> or scoping errors with it.
>
> Now with that in mind, what is the idiomatic way? Do people usually do
> something else because it is /faster/ (by some definition)?
>
> Kind regards,
>
>


-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown



More information about the R-help mailing list