[R] any way to make the code more efficient ?

Charles C. Berry cberry at tajo.ucsd.edu
Fri Dec 8 23:52:55 CET 2006



Save your intermediate results as a list of matrices.

Then rbind them all at once using do.call.

It looks like this will save 23 seconds (see below), if you are running on 
a PC like mine (AMD 2GHz, WinXP ).

But I wonder, if 23 a mere seconds is all you save is this really worth 
worrying about??

Maybe you are losing time elsewhere.

If so, you need to profile this run and/or track memory usage.


> amat <- NULL
> mat.1400.by.4 <- matrix(1:(1400*4),nc=4)
> system.time(for (i in 1:500) amat <- rbind(amat, mat.1400.by.4 ))
[1] 20.05  1.53 23.24    NA    NA
> 
> list.of.matrices <- rep( list( mat.1400.by.4 ) , 500 )
> system.time( amat2 <- do.call(rbind, list.of.matrices ) )
[1] 0.08 0.00 0.08   NA   NA
> all.equal(amat,amat2)
[1] TRUE
>

On Fri, 8 Dec 2006, Leeds, Mark (IED) wrote:

> The code bekow works so this is why I didn't include the data to
> reproduce it. The  loops about 500
> times and each time, a zoo object with 1400 rows and 4 columns gets
> created. ( the rows represent minutes so each file is one day
> worth of data). Inside the loop, I keep rbinding the newly created zoo
> object to the current zoo object so that it gets bigger and
> bigger over time.
>
> Eventually, the new zoo object, fullaggfxdata,  containing all the days
> of data is created.
>
> I was just wondering if there is a more efficient way of doing this. I
> do know the number of times the loop will be done at the beginning so
> maybe creating the a matrix or data frame at the beginning and putting
> the daily ones in something like that would
> Make it be faster. But, the proboem with this is I eventually do need a
> zoo object.  I ask this question because at around the 250
> mark of the loop, things start to slow down significiantly and I think I
> remember reading somewhere that doing an rbind of something to itself is
> not a good idea.  Thanks.
>
> #=======================================================================
> ===============================================
>
> start<-1
>
> for (filecounter in (1:length(datafilenames))) {
>
> print(paste("File Counter = ", filecounter))
> datafile= paste(datadir,"/",datafilenames[filecounter],sep="")
> aggfxdata<-clnaggcompcurrencyfile(fxfile=datafile,aggminutes=aggminutes,
> fillholes=1)
> logbidask<-log(aggfxdata[,"bidask"])
> aggfxdata<-cbind(aggfxdata,logbidask)
>
> if ( start == 1 ) {
> fullaggfxdata<-aggfxdata
> start<-0
> } else {
> fullaggfxdata<-rbind(fullaggfxdata,aggfxdata)
> }
>
>
> }
>
> #=======================================================================
> ==================================
> --------------------------------------------------------
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                        (858) 534-2098
                                          Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	         UC San Diego
http://biostat.ucsd.edu/~cberry/         La Jolla, San Diego 92093-0717




More information about the R-help mailing list