[R] Fwd: combining data from multiple read.delim() invocations.

Tue Jul 1 22:30:52 CEST 2014

On Tue, Jul 1, 2014 at 12:03 PM, John McKown
<john.archie.mckown at gmail.com> wrote:
> On Tue, Jul 1, 2014 at 11:31 AM, David L Carlson <dcarlson at tamu.edu> wrote:
>
>> There is a better way. First we need some data. This creates three files
>> in your home directory, each with five rows:
>>
>> write.table(data.frame(rep("A", 5), Sys.time(), Sys.time()),
>>         "A.tab", sep="\t", row.names=FALSE, col.names=FALSE)
>> write.table(data.frame(rep("B", 5), Sys.time(), Sys.time()),
>>          "B.tab", sep="\t", row.names=FALSE, col.names=FALSE)
>> write.table(data.frame(rep("C", 5), Sys.time(), Sys.time()),
>>         "C.tab", sep="\t", row.names=FALSE, col.names=FALSE)
>>
>> Now to read and combine them into a single data.frame:
>>
>> fls <- c("A.tab", "B.tab", "C.tab")
>> df.list <- lapply(fls, read.delim, header=FALSE,
>> col.names=c("lpar","started","ended"),
>>            as.is=TRUE, na.strings='\\N',
>> colClasses=c("character","POSIXct","POSIXct"))
>> df.all <- do.call(rbind, df.list)
>> > str(df.all)
>> 'data.frame':   15 obs. of  3 variables:
>>  $ lpar   : chr  "A" "A" "A" "A" ...
>>  $ started: POSIXct, format: "2014-07-01 11:25:05" "2014-07-01 11:25:05"
>> ...
>>  $ ended  : POSIXct, format: "2014-07-01 11:25:05" "2014-07-01 11:25:05"
>> ...
>>
>> -------------------------------------
>> David L Carlson
>>
>
> I do like that better than my version. Mainly because it is fewer
> statements. I'm rather new with R and the *apply series of functions is
> "bleeding edge" for me. And I haven't the the "do.call" before either. I'm
> still reading. But the way that I learn best is to try projects as I am
> learning. So I get ahead of myself.

If you have not already done so, please read "An Introduction to R" or
online tutorial of your choice before posting further. I do not
consider it proper to post queries concerning basics that you can
easily learn about yourself.  I DO consider it proper to post queries
about such topics if you have made the effort but are still confused.
That is what this list is for. You can decide -- and chastise me if
you like -- into which category you fit.

Cheers,
Bert

>
> According to the Linux "time" command, your method for a single input file,
> resulting in 144 output elements in the data.frame, took:
> real    0m0.525s
> user    0m0.441s
> sys     0m0.063s
>
> Mine:
> real    0m0.523s
> user    0m0.446s
> sys     0m0.060s
>
> Basically, a "wash". For a stress, I took in all 136 of my files in a
> single execution. Output was 22,823 elements in the data.frame.
> Yours:
> real    3m32.651s
> user    3m26.837s
> sys     0m2.292s
>
> Mine:
> real    3m24.603s
> user    3m20.225s
> sys     0m0.969s
>
> Still a wash. Of course, since I run this only once a week, on a Sunday,
> the time is not too important. I actually think that your solution is a bit
> more readable than mine. So long as I document what is going on.
>
> ===
>
> I had considered combining all the files together using the R "pipe"
> command to run the UNIX "cat" command, something like:
>
> command <- paste("cat ",arguments,collapse=" ");
> read.delim(pipe(command), ...
>
> but I was trying to be "pure R" since I am a Linux bigot surrounded by
> Windows weenies <grin/>.
>
> ===
>
> Hook'em horns!
>
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown
>
>
>
> --
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
>
> Maranatha! <><
> John McKown
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.