[R] Writing a single output file

Thu Dec 30 14:07:17 CET 2010

It looks like you have csv files, so use read.csv instead of read.table.
Hadley

On Thu, Dec 30, 2010 at 12:18 AM, Amy Milano <milano_amy at yahoo.com> wrote:
> Dear sir,
>
> At the outset I sincerely apologize for reverting back bit late as I was out of office. I thank you for your guidance extended by you in response to my earlier mail regarding "Writing a single output file" where I was trying to read multiple output files and create a single output date.frame. However, I think things are not working as I am mentioning below -
>
>
> # Your code
>
> setwd('/temp')
> fileNames <- list.files(pattern = "file.*.csv")
>
> input <- do.call(rbind, lapply(fileNames, function(.name)
> {
> .data <- read.table(.name, header = TRUE, as.is = TRUE)
> .data$file <- .name
> .data
> }))
>
>
> # This produces following output containing only two columns and moreover date and yield_rates are clubbed together.
>
>
>
>  date.yield_rate      file
> 1   12/23/10,5.25 file1.csv
> 2   12/22/10,5.19 file1.csv
> 3   12/23/10,4.16 file2.csv
> 4   12/22/10,4.59 file2.csv
> 5   12/23/10,6.15 file3.csv
> 6   12/22/10,6.41 file3.csv
> 7   12/23/10,8.15 file4.csv
> 8   12/22/10,8.68 file4.csv
>
>
> # and NOT the kind of output given below where date and yield_rates are different.
>
>> input
>         date      yield_rate      file
> 1 12/23/2010       5.25 file1.csv
> 2 12/22/2010       5.19 file1.csv
> 3 12/23/2010       5.25 file2.csv
> 4 12/22/2010       5.19 file2.csv
> 5 12/23/2010       5.25 file3.csv
> 6
>  12/22/2010       5.19 file3.csv
> 7 12/23/2010       5.25 file4.csv
> 8 12/22/2010       5.19 file4.csv
>
> So when I tried following code to produce the required result, it throws me an error.
>
> require(reshape)
>
> in.melt <- melt(input, measure = 'yield_rate')
>> in.melt <- melt(input, measure = 'yield_rate')
> Error: measure variables not found in data: yield_rate
>
> # So I tried
>
> in.melt <- melt(input, measure = 'date.yield_rate')
>
>
> cast(in.melt, date.yield_rate ~ file)
>
>> cast(in.melt, date ~ file)
> Error: Casting formula contains variables not found in molten data: date
>
> # If I try to change it as
>
> cast(in.melt, date.yield_rate ~ file)    # Gives following error.
> Error: Casting formula contains variables not found in molten data: date.yield_rate
>
> Sir, it will be a
>  great help if you can guide me and once again sinserely apologize for reverting so late.
>
> Regards
>
> Amy
>
>
> --- On Thu, 12/23/10, jim holtman <jholtman at gmail.com> wrote:
>
> From: jim holtman <jholtman at gmail.com>
> Subject: Re: [R] Writing a single output file
> To: "Amy Milano" <milano_amy at yahoo.com>
> Cc: r-help at r-project.org
> Date: Thursday, December 23, 2010, 1:39 PM
>
> This should get you close:
>
>> # get file names
>> setwd('/temp')
>> fileNames <- list.files(pattern = "file.*.csv")
>> fileNames
> [1] "file1.csv" "file2.csv" "file3.csv" "file4.csv"
>> input <- do.call(rbind, lapply(fileNames, function(.name){
> +     .data <- read.table(.name, header = TRUE, as.is = TRUE)
> +     # add
>  file name to the data
> +     .data$file <- .name
> +     .data
> + }))
>> input
>         date yield_rate      file
> 1 12/23/2010       5.25 file1.csv
> 2 12/22/2010       5.19 file1.csv
> 3 12/23/2010       5.25 file2.csv
> 4 12/22/2010       5.19 file2.csv
> 5 12/23/2010       5.25 file3.csv
> 6 12/22/2010       5.19 file3.csv
> 7 12/23/2010       5.25 file4.csv
> 8 12/22/2010       5.19 file4.csv
>> require(reshape)
>> in.melt <- melt(input, measure = 'yield_rate')
>> cast(in.melt, date ~ file)
>         date file1.csv file2.csv file3.csv file4.csv
> 1 12/22/2010      5.19      5.19
>      5.19      5.19
> 2 12/23/2010      5.25      5.25      5.25      5.25
>>
>
>
> On Thu, Dec 23, 2010 at 8:07 AM, Amy Milano <milano_amy at yahoo.com> wrote:
>> Dear R helpers!
>>
>> Let me first wish all of you "Merry Christmas and Very Happy New year 2011"
>>
>> "Christmas day is a day of Joy and Charity,
>> May God make you rich in both" - Phillips Brooks
>>
>> ## ----------------------------------------------------------------------------------------------------------------------------
>>
>> I have a process which generates number of outputs. The R code for the same is as given below.
>>
>> for(i in 1:n)
>> {
>> write.csv(output[i], file = paste("output", i, ".csv", sep = ""), row.names =
>  FALSE)
>> }
>>
>> Depending on value of 'n', I get different output files.
>>
>> Suppose n = 3, that means I am having three output csv files viz. 'output1.csv', 'output2.csv' and 'output3.csv'
>>
>> output1.csv
>> date               yield_rate
>> 12/23/2010        5.25
>> 12/22/2010        5.19
>> .................................
>> .................................
>>
>>
>> output2.csv
>>
>> date               yield_rate
>>
>> 12/23/2010        4.16
>>
>> 12/22/2010        4.59
>>
>> .................................
>>
>>
>  .................................
>>
>> output3.csv
>>
>>
>> date               yield_rate
>>
>>
>> 12/23/2010        6.15
>>
>>
>> 12/22/2010        6.41
>>
>>
>> .................................
>>
>>
>> .................................
>>
>>
>>
>> Thus all the output files have same column names viz. Date and yield_rate. Also, I do need these files individually too.
>>
>> My further requirement is to have a single dataframe as given below.
>>
>> Date             yield_rate1
>  yield_rate2                yield_rate3
>> 12/23/2010       5.25                          4.16                          6.15
>> 12/22/2010       5.19                          4.59                          6.41
>> ...............................................................................................
>> ...............................................................................................
>>
>> where
>  yield_rate1 = output1$yield_rate and so on.
>>
>> One way is to simply create a dataframe as
>>
>> df = data.frame(Date = read.csv('output1.csv')$Date, yield_rate1 =  read.csv('output1.csv')$yield_rate,   yield_rate2 = read.csv('output2.csv')$yield_rate,
>> yield_rate3 = read.csv('output3.csv')$yield_rate)
>>
>> However, the problem arises when I am not aware how many output files are there as n can be 5 or even 100.
>>
>> So is it possible to write some loop or some function which will enable me to read 'n' files individually and then keeping "Date" common, only pickup the yield_curve data from each output file.
>>
>> Thanking in advance for any guidance.
>>
>> Regards
>>
>> Amy
>>
>>
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>>
>>
>  ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/