[R] Writing a single output file

Sat Dec 25 21:16:55 CET 2010

Many ways of doing this and you have to think about efficiency and 
logisitcs of different approaches.

If the data is not large, you can read all n files into a list and then 
combine. If data is very large, you may wish to read one file at a time, 
combining and then deleting it before reading the next file. You can use 
cbind() to combine if all the Date columns are the same, otherwise 
merge() is useful.

The simple brute force approach would be:

  fns <- list.files(pattern="^output")
  do.call( "cbind", lapply(fns, read.csv, row.names=1) )

The slightly more optimized and flexible optiop but slightly less 
elegant could be something like this:

  fns <- list.files(pattern="^output")
  out <- read.csv(fns[1], row.names=NULL)

  for(fn in fns[-1]){
    tmp <- read.csv(fn, row.names=NULL)
    out <- merge(out, tmp, by=1, all=T)
    rm(tmp); gc()
  }

You have to see which option is best for your file sizes. Good luck.

Regards, Adai

On 23/12/2010 13:07, Amy Milano wrote:
> Dear R helpers!
>
> Let me first wish all of you "Merry Christmas and Very Happy New year 2011"
>
> "Christmas day is a day of Joy and Charity,
> May God make you rich in both" - Phillips Brooks
>
> ## ----------------------------------------------------------------------------------------------------------------------------
>
> I have a process which generates number of outputs. The R code for the same is as given below.
>
> for(i in 1:n)
> {
> write.csv(output[i], file = paste("output", i, ".csv", sep = ""), row.names = FALSE)
> }
>
> Depending on value of 'n', I get different output files.
>
> Suppose n = 3, that means I am having three output csv files viz. 'output1.csv', 'output2.csv' and 'output3.csv'
>
> output1.csv
> date               yield_rate
> 12/23/2010        5.25
> 12/22/2010        5.19
> .................................
> .................................
>
>
> output2.csv
>
> date               yield_rate
>
> 12/23/2010        4.16
>
> 12/22/2010        4.59
>
> .................................
>
> .................................
>
> output3.csv
>
>
> date               yield_rate
>
>
> 12/23/2010        6.15
>
>
> 12/22/2010        6.41
>
>
> .................................
>
>
> .................................
>
>
>
> Thus all the output files have same column names viz. Date and yield_rate. Also, I do need these files individually too.
>
> My further requirement is to have a single dataframe as given below.
>
> Date             yield_rate1               yield_rate2                yield_rate3
> 12/23/2010       5.25                          4.16                          6.15
> 12/22/2010       5.19                          4.59                          6.41
> ...............................................................................................
> ...............................................................................................
>
> where yield_rate1 = output1$yield_rate and so on.
>
> One way is to simply create a dataframe as
>
> df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 =  read.csv('output1.csv')$yield_rate,   yield_rate2 = read.csv('output2.csv')$yield_rate,
> yield_rate3 = read.csv('output3.csv')$yield_rate)
>
> However, the problem arises when I am not aware how many output files are there as n can be 5 or even 100.
>
> So is it possible to write some loop or some function which will enable me to read 'n' files individually and then keeping "Date" common, only pickup the yield_curve data from each output file.
>
> Thanking in advance for any guidance.
>
> Regards
>
> Amy
>
>
>
>
>
> 	[[alternative HTML version deleted]]
>