[R] "Best" way to merge 300+ .5MB dataframes?

John McKown john.archie.mckown at gmail.com
Tue Aug 12 05:01:15 CEST 2014


On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams <tea3rd at gmail.com> wrote:
> Grant,
>
> Assuming all your filenames are something like file1.txt,
> file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
> the directory where your files are located...
>
> This will strip off the 1st lines, that is, your header lines:
>
> for file in *.txt;do
> sed -i '1d'${file};
> done
>
> Then, do this:
>
> cat *.txt > newfilename.txt
>
> Doing both should only take a few seconds, depending on your file sizes.
>
> Cheers!
> Tom
>

Using sed hadn't occurred to me. I guess I'm just "awk-ward" <grin/>.
A slightly different way would be:

for file in *.txt;do
  sed '1d' ${file}
done >newfilename.txt

that way the original files are not modified.  But it strips out the
header on the 1st file as well. Not a big deal, but the read.table
will need to be changed to accommodate that. Also, it creates an
otherwise unnecessary intermediate file "newfilename.txt". To get the
1st file's header, the script could:

head -1 >newfilename.txt
for file in *.txt;do
   sed '1d' ${file}
done >>newfilename.txt

I really like having multiple answers to a given problem. Especially
since I have a poorly implemented version of "awk" on one of my
systems. It is the vendor's "awk" and conforms exactly to the POSIX
definition with no additions. So I don't have the FNR built-in
variable. Your implementation would work well on that system. Well, if
there were a version of R for it. It is a branded UNIX system which
was designed to be totally __and only__ POSIX compliant, with few
(maybe no) extensions at all. IOW, it stinks. No, it can't be
replaced. It is the z/OS system from IBM which is EBCDIC based and
runs on the "big iron" mainframe, system z.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown



More information about the R-help mailing list