[R] "Best" way to merge 300+ .5MB dataframes?

David Winsemius dwinsemius at comcast.net
Tue Aug 12 08:07:13 CEST 2014


On Aug 11, 2014, at 8:01 PM, John McKown wrote:

> On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams <tea3rd at gmail.com> wrote:
>> Grant,
>> 
>> Assuming all your filenames are something like file1.txt,
>> file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
>> the directory where your files are located...
>> 
>> This will strip off the 1st lines, that is, your header lines:
>> 
>> for file in *.txt;do
>> sed -i '1d'${file};
>> done
>> 
>> Then, do this:
>> 
>> cat *.txt > newfilename.txt
>> 
>> Doing both should only take a few seconds, depending on your file sizes.
>> 
>> Cheers!
>> Tom
>> 
> 
> Using sed hadn't occurred to me. I guess I'm just "awk-ward" <grin/>.
> A slightly different way would be:
> 
> for file in *.txt;do
>  sed '1d' ${file}
> done >newfilename.txt
> 
> that way the original files are not modified.  But it strips out the
> header on the 1st file as well. Not a big deal, but the read.table
> will need to be changed to accommodate that. Also, it creates an
> otherwise unnecessary intermediate file "newfilename.txt". To get the
> 1st file's header, the script could:
> 
> head -1 >newfilename.txt
> for file in *.txt;do
>   sed '1d' ${file}
> done >>newfilename.txt
> 
> I really like having multiple answers to a given problem. Especially
> since I have a poorly implemented version of "awk" on one of my
> systems. It is the vendor's "awk" and conforms exactly to the POSIX
> definition with no additions. So I don't have the FNR built-in
> variable. Your implementation would work well on that system. Well, if
> there were a version of R for it. It is a branded UNIX system which
> was designed to be totally __and only__ POSIX compliant, with few
> (maybe no) extensions at all. IOW, it stinks. No, it can't be
> replaced. It is the z/OS system from IBM which is EBCDIC based and
> runs on the "big iron" mainframe, system z.
> 
> -- 

On the Mac the awk equivalent is gawk. Within R you would use `system()` possibly using paste0() to construct a string to send.

-- 



David Winsemius
Alameda, CA, USA



More information about the R-help mailing list