[R] combining data from multiple read.delim() invocations.

David L Carlson dcarlson at tamu.edu
Tue Jul 1 20:22:31 CEST 2014


I agree it is not necessarily faster, but the code is more compact since we don't have to initialize the variable or explicitly refer to the index. For big data it has the disadvantage of storing the data twice. 

For speed, this is faster and does not store the data twice, but is system dependent. For Windows:

shell("copy ?.tab Combined.tab")
df.all <- read.delim("Combined.tab", header=FALSE, col.names=c("lpar","started","ended"),
           as.is=TRUE, na.strings='\\N', colClasses=c("character","POSIXct","POSIXct"))

David C

-----Original Message-----
From: Bert Gunter [mailto:gunter.berton at gene.com] 
Sent: Tuesday, July 1, 2014 12:33 PM
To: David L Carlson
Cc: John McKown; r-help at r-project.org
Subject: Re: [R] combining data from multiple read.delim() invocations.

Maybe, David, but this isn't really it.

Your code just basically reproduces the explicit for() loop with the
lapply. Maybe there might be some advantage in rbinding the list over
incrementally adding rows to the data frame, but I would be surprised
if it made much of a difference either way.  Of course, someone with
actual data might prove me wrong...

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Tue, Jul 1, 2014 at 9:31 AM, David L Carlson <dcarlson at tamu.edu> wrote:
> There is a better way. First we need some data. This creates three files in your home directory, each with five rows:
>
> write.table(data.frame(rep("A", 5), Sys.time(), Sys.time()),
>         "A.tab", sep="\t", row.names=FALSE, col.names=FALSE)
> write.table(data.frame(rep("B", 5), Sys.time(), Sys.time()),
>          "B.tab", sep="\t", row.names=FALSE, col.names=FALSE)
> write.table(data.frame(rep("C", 5), Sys.time(), Sys.time()),
>         "C.tab", sep="\t", row.names=FALSE, col.names=FALSE)
>
> Now to read and combine them into a single data.frame:
>
> fls <- c("A.tab", "B.tab", "C.tab")
> df.list <- lapply(fls, read.delim, header=FALSE, col.names=c("lpar","started","ended"),
>            as.is=TRUE, na.strings='\\N', colClasses=c("character","POSIXct","POSIXct"))
> df.all <- do.call(rbind, df.list)
>> str(df.all)
> 'data.frame':   15 obs. of  3 variables:
>  $ lpar   : chr  "A" "A" "A" "A" ...
>  $ started: POSIXct, format: "2014-07-01 11:25:05" "2014-07-01 11:25:05" ...
>  $ ended  : POSIXct, format: "2014-07-01 11:25:05" "2014-07-01 11:25:05" ...
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of John McKown
> Sent: Tuesday, July 1, 2014 7:07 AM
> To: r-help at r-project.org
> Subject: [R] combining data from multiple read.delim() invocations.
>
> Is there a better way to do the following? I have data in a number of tab
> delimited files. I am using read.delim() to read them, in a loop. I am
> invoking my code on Linux Fedora 20, from the BASH command line, using
> Rscript. The code I'm using looks like:
>
> arguments <- commandArgs(trailingOnly=TRUE);
> # initialize the capped_data data.frame
> capped_data <- data.frame(lpar="NULL",
>                        started=Sys.time(),
>                        ended=Sys.time(),
>                        stringsAsFactors=FALSE);
> # and empty it.
> capped_data <- capped_data[-1,];
> #
> # Read in the data from the files listed
> for (file in arguments) {
>     data <- read.delim(file,
>                     header=FALSE,
>                     col.names=c("lpar","started","ended"),
>                     as.is=TRUE,
>                     na.strings='\\N',
>                     colClasses=c("character","POSIXct","POSIXct"));
>     capped_data <- rbind(capped_data,data)
> }
> #
>
> I.e. is there an easier way than doing a read.delim/rbind in a loop?
>
>
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list