[R] Running cumulative sums in matrices

Bert Gunter gunter.berton at gene.com
Wed Apr 14 17:49:11 CEST 2010


Eleni et. al.:

Perhaps it's worth noting that there is generally NO reason to prefer
apply-family code to explicit for-loops for execution speed. Apply-type
statments **are** essentially disguised loops -- that is, they execute the
loop code repeatedly at the R interpreter level. They do employ some
efficiency tricks to try to do so as fast as possible; but as posts in this
thread have already noted, whether they run faster or slower than explicit
loops is generally code and problem specific. Sometimes yes; sometimes no;
often about the same.

So, for example,

myfun <- function(x){...}
z <- somelist
ans <-lapply(z,myfun)

## and 

ans <- vector("list",10)
for(i in seq_len(length(z)))ans[[i]] <- myfun(z[[i]])

should take about the same time.

The main reason to prefer the former instead of the latter is that the
former conforms to R's functional programming paradigm and tends to produce
cleaner, more debuggable, more maintainable code (I realize that this is a
subjective preference with which many may disagree).

When speedup is desired, the key is to move the loop from the interpreted to
the compiled code level via "vectorization", either by making use of R's
built-in compiled functions (like cumsum), which are generally .Internal or
.Primitive, or to write and call your own compiled code, e.g.via .Call. This
often can make things orders of magnitude faster.

I hope this provides some clarification about an issue that many seem
confused about. If anything I have said is misstated or requires further
clarification, I would appreciate corrections.


Bert Gunter
Genentech Nonclinical Statistics

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Eleni Rapsomaniki
Sent: Wednesday, April 14, 2010 5:18 AM
To: r-help at r-project.org
Subject: [R] Running cumulative sums in matrices


Dear R-helpers,

I have a huge data-set so need to avoid for loops as much as possible. Can
someone think how I can compute the result in the following example (that
uses a for-loop) using some version of apply instead (or any other similarly
super-efficient function)? 

example:
#Suppose a matrix:
m1=cbind(1:5,1:5,1:5)

#The aim is to create a new matrix with every column containing the
cumulative sum of all previous columns.
m2=m1
for(i in 2:ncol(m1)){
    m2[,i]=apply(m1[,1:i],1,sum)
}
m2

Many thanks in advance

Eleni Rapsomaniki

Research Associate
Strangeways Research Laboratory
Department of Public Health and Primary Care
University of Cambridge
 

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list