[R] alternative to rbind within a loop

Denis Chabot chabot.denis at gmail.com
Thu Jul 23 23:43:48 CEST 2009


Hi Greg,

Thanks, very encouraging: with my example, this is 10x more efficient  
than my loop:
utilisateur     système      écoulé
      13.819       5.510      20.204
>> utilisateur     système      écoulé
>>     156.206      44.859     202.150

In real life, I did some work on each file before doing rbind. I'll  
see if this work can be put in a custom-built function that would go  
into the lapply call you suggested.

Denis

Le 09-07-23 à 17:27, Greg Snow a écrit :

> Try something like (untested):
>
>> mylist <- lapply(all.files, function(i) read.csv(i) )
>> mydf <- do.call('rbind', mylist)
>
> If all the csv files are conformable that rbind works on them (if  
> the loop method works then that should be the case) then this will  
> read in each file, store the data frames as a list, then rbind them  
> all together.
>
> It seems that this should be faster than the loop, but testing will  
> be needed to be sure.
>
> Hope this helps,
>
> -- 
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Denis Chabot
>> Sent: Thursday, July 23, 2009 1:54 PM
>> To: list R
>> Subject: [R] alternative to rbind within a loop
>>
>> Hi,
>>
>> I often have to do this:
>>
>> select a folder (directory) containing a few hundred data files in  
>> csv
>> format (up to 1000 files, in fact)
>>
>> open each file, transform some character variables in date-tiime  
>> format
>>
>> make into a dataframe (involves getting rid of a few variables I  
>> don't
>> need
>>
>> concatenate to the master dataframe that will eventually contain the
>> data from all the files in the folder.
>>
>> I use a loop going from 1 to the number of files. I have added a
>> command to print an incrementing number to the R console each time  
>> the
>> loop completes one iteration, to judge the speed of the process.
>>
>> At the beginning, 3-4 files are processed each second. After a few
>> hundred iterations it slows down to about 1 file per second. Before I
>> reach the last file (898 in the case at hand), it has become much
>> slower, about 1 file every 2-3 seconds.
>>
>> This progressive slowing down suggests the problem is linked to the
>> size of the growing "master" dataframe that rbind combines with each
>> new file.
>>
>> In fact, the small script below confirms this as nothing at all
>> happens within the loop but rbind. You can cut the size of this
>> example not to waste to much of your time:
>>
>>
>> # create a dummy data.frame and copy it in a large number of csv  
>> files
>>
>> test  <- file.path("test")
>>
>> a <- 1:350
>> b <- rnorm(350,100,10)
>> c <- runif(350, 0, 100)
>> d <- month.name[runif(350,1,12)]
>>
>> the.data <- data.frame(a,b,c,d)
>>
>> for(i in 1:850){
>> 	write.csv(the.data, file=paste(test, "/file_", i, ".csv",
>> sep=""))
>> }
>>
>> # now lets make a single dataframe from all these csv files
>>
>> all.files <- list.files(path=test,full.names=T,pattern=".csv")
>>
>> new.data <- NULL
>>
>> system.time({
>> 	for(i in all.files){
>> 	in.data <- read.csv(i)
>> 	if (is.null(new.data)) {new.data = in.data} else {new.data =
>> rbind(new.data, in.data)}
>> 	cat(paste(i, ", ", sep=""))
>> } # end for
>> }) # end system.time
>>
>> utilisateur     système      écoulé
>>     156.206      44.859     202.150
>> This is with
>>
>> sessionInfo()
>> R version 2.9.1 Patched (2009-07-16 r48939)
>> x86_64-apple-darwin9.7.0
>>
>> locale:
>> fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] doBy_3.7        chron_2.3-30    timeDate_290.84
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.12.0  grid_2.9.1      Hmisc_3.5-2     lattice_0.17-25
>> tools_2.9.1
>>
>>
>> Would it be better to somehow save all 850 files in one dataframe
>> each, and then rbind them all in a single operation?
>>
>> Can I combine all my files without using a loop? I've never quite
>> mastered the "apply" family of functions but have not seen examples  
>> to
>> read files.
>>
>> Thanks in advance,
>>
>> Denis Chabot
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list