[R] data input strategy - lots of csv files

Gabor Grothendieck ggrothendieck at gmail.com
Thu May 11 13:25:18 CEST 2006


Assuming:

my.files <- c("file1.csv", "file2.csv", ..., "filen.csv")

use read.zoo in the zoo package and merge.zoo (which
can do a multiway merge):

library(zoo)
do.call("merge", lapply(my.files, read.zoo, ...any.other.read.zoo.args...))

After loading zoo see:
vignette("zoo")
?read.zoo
?merge.zoo

On 5/11/06, Sean O'Riordain <sean.oriordain at gmail.com> wrote:
> Good morning,
> I have currently 63 .csv files most of which have lines which look like
>  01/06/05,23445
> Though some files have two numbers beside each date.  There are
> missing values, and currently the longest file has 318 rows.
>
> (merge() is losing the head and doing runaway memory allocation - but
> thats another question - I'm still trying to pin that issue down and
> make a small repeatable example)
>
> Currently I'm reading in these files with lines like
>  a1 <- read.csv("daft_file_name_1.csv",header=F)
>  ...
>  a63 <- read.csv("another_silly_filename_63.csv",header=F)
>
> and then i'm naming the columns in these like...
>  names(a1)[2] <- "silly column name"
>  ...
>  names(a63)[2] <- "daft column name"
>
> then trying to merge()...
>  atot <- merge(a1, a2, all=T)
> and then using language manipulation to loop
>  atot <- merge(atot, a3, all=T)
>  ...
>  atot <- merge(atot, a63, all=T)
> etc...
>
> followed by more language manipulation
> for() {
>  rm(a1)
> } etc...
>
> i.e.
> for (i in 2:63) {
>    atot <- merge(atot, eval(parse(text=paste("a", i, sep=""))), all=T)
>    #     eval(parse(text=paste("a",i,"[1] <- NULL",sep="")))
>
>    cat("i is ", i, gc(), "\n")
>
>    # now delete these 63 temporary objects...
>    # e.g. should look like rm(a33)
>    eval(parse(text=paste("rm(a",i,")", sep="")))
> }
>
> eventually getting a dataframe with the first column being the date,
> and the subsequent 63 columns being the data... with missing values
> coded as NA...
>
> so my question is... is there a better strategy for reading in lots of
> small files (only a few kbytes each) like that which are timeseries
> with missing data... which doesn't go through the above awkwardness
> (and language manipulation) but still ends up with a nice data.frame
> with NA values correctly coded etc.
>
> Many thanks,
> Sean O'Riordain
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list