[R] Using reduce to merge multiple files

Henrik Bengtsson hb at biostat.ucsf.edu
Fri Jun 13 02:13:29 CEST 2014


On Thu, Jun 12, 2014 at 10:16 AM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
> I have a list of files that I have called like so:
>
> main_dir <- '/path/to/files/'
> directories <- list.files(main_dir, pattern = '[[:alnum:]]', full.names=T)
>
> filenames <- list.files(file.path(directories,"/tmpdir/"),  pattern =
> '[[:alnum:][:punct:]]_eat.txt+$', recursive = TRUE, full.names=T)
>
> This lists around 35 Files.  Each has multiple columns but they all
> have three columns in common: Burger, Stall and Cost which I want to
> merge on using:
>
> m1 <- Reduce(function(a, b) { merge(a, b,
> by=c("Burger",Stall","Cost")) }, filenames)
>
> However, I get the error:
>
> Error in fix.by(by.x, x) : 'by' must specify uniquely valid columns
>
> Is there something that I have obviously overlooked here?

You're forgetting to read the data, i.e. you need to call read.table()
before merging.

Here's an alternative (that does the same internally):

library("R.filesets")
m1 <- readDataFrame(filenames, colClasses=c("(Burger|Stall|Cost)"=NA))

If you know what data types the different column hold, then you can
guide R to the same faster and more memory efficient, e.g.

m1 <- readDataFrame(filenames, colClasses=c("(Burger|Stall)"="factor",
"Cost"="double"))

/Henrik


>
> Thanks in advance!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list