[R] data input strategy - lots of csv files

Sean O'Riordain sean.oriordain at gmail.com
Thu May 11 16:11:11 CEST 2006


Thank you folks - most helpful as always!

Now I have a bit of studying to do :-) I've never really understood
before how to use lapply (or anyother apply) so this gives me a real
problem relating to my own to work with!

Thanks again,
Sean


On 11/05/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Assuming:
>
> my.files <- c("file1.csv", "file2.csv", ..., "filen.csv")
>
> use read.zoo in the zoo package and merge.zoo (which
> can do a multiway merge):
>
> library(zoo)
> do.call("merge", lapply(my.files, read.zoo, ...any.other.read.zoo.args...))
>
> After loading zoo see:
> vignette("zoo")
> ?read.zoo
> ?merge.zoo
>
> On 5/11/06, Sean O'Riordain <sean.oriordain at gmail.com> wrote:
> > Good morning,
> > I have currently 63 .csv files most of which have lines which look like
> >  01/06/05,23445
> > Though some files have two numbers beside each date.  There are
> > missing values, and currently the longest file has 318 rows.
> >
> > (merge() is losing the head and doing runaway memory allocation - but
> > thats another question - I'm still trying to pin that issue down and
> > make a small repeatable example)
> >
> > Currently I'm reading in these files with lines like
> >  a1 <- read.csv("daft_file_name_1.csv",header=F)
> >  ...
> >  a63 <- read.csv("another_silly_filename_63.csv",header=F)
> >
> > and then i'm naming the columns in these like...
> >  names(a1)[2] <- "silly column name"
> >  ...
> >  names(a63)[2] <- "daft column name"
> >
> > then trying to merge()...
> >  atot <- merge(a1, a2, all=T)
> > and then using language manipulation to loop
> >  atot <- merge(atot, a3, all=T)
> >  ...
> >  atot <- merge(atot, a63, all=T)
> > etc...
> >
> > followed by more language manipulation
> > for() {
> >  rm(a1)
> > } etc...
> >
> > i.e.
> > for (i in 2:63) {
> >    atot <- merge(atot, eval(parse(text=paste("a", i, sep=""))), all=T)
> >    #     eval(parse(text=paste("a",i,"[1] <- NULL",sep="")))
> >
> >    cat("i is ", i, gc(), "\n")
> >
> >    # now delete these 63 temporary objects...
> >    # e.g. should look like rm(a33)
> >    eval(parse(text=paste("rm(a",i,")", sep="")))
> > }
> >
> > eventually getting a dataframe with the first column being the date,
> > and the subsequent 63 columns being the data... with missing values
> > coded as NA...
> >
> > so my question is... is there a better strategy for reading in lots of
> > small files (only a few kbytes each) like that which are timeseries
> > with missing data... which doesn't go through the above awkwardness
> > (and language manipulation) but still ends up with a nice data.frame
> > with NA values correctly coded etc.
> >
> > Many thanks,
> > Sean O'Riordain
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>




More information about the R-help mailing list