[R] fast rowCumsums wanted for calculating the cdf

Fri Oct 15 09:23:23 CEST 2010

Hi,

You might look at Reduce().  It seems faster.  I converted the matrix
to a list in an incredibly sloppy way (which you should not emulate)
because I cannot think of  the simple way.

> probs <- t(matrix(rep(1:10000000), nrow=10)) # matrix with row-wise probabilites
> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
> F[,1] <- probs[,1,drop=TRUE];
> add <- function(x) {Reduce(`+`, x, accumulate = TRUE)}
>
>
> system.time(F.slow <- t(apply(probs, 1, cumsum)))
   user  system elapsed
 36.758   0.416  42.464
>
> system.time(for (cc in 2:ncol(F)) {
+  F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
+ })
   user  system elapsed
  0.980   0.196   1.328
>
> system.time(add(list(probs[,1], probs[,2], probs[,3], probs[,4], probs[,5], probs[,6], probs[,7], probs[,8], probs[,9], probs[,10])))
   user  system elapsed
  0.420   0.072   0.539
>

.539 seconds for 10 vectors each with 1 million elements does not seem
bad.  For 30000 each, on my system it took .014 seconds, which for
200,000 iterations, I estimated as:

> (200000*.014)/60/60
[1] 0.7777778

(and this is on a stone age system with a single core processor and
only 756MB of RAM)

It should not be difficult to get the list back to a matrix.

Cheers,

Josh

On Thu, Oct 14, 2010 at 11:51 PM, Gregor <mailinglist at gmx.at> wrote:
> Dear all,
>
> Maybe the "easiest" solution: Is there anything that speaks against generalizing
> cumsum from base to cope with matrices (as is done in matlab)? E.g.:
>
> "cumsum(Matrix, 1)"
> equivalent to
> "apply(Matrix, 1, cumsum)"
>
> The main advantage could be optimized code if the Matrix is extreme nonsquare
> (e.g. 100,000x10), but the summation is done over the short side (in this case 10).
> apply would practically yield a loop over 100,000 elements, and vectorization w.r.t.
> the long side (loop over 10 elements) provides considerable efficiency gains.
>
> Many regards,
> Gregor
>
>
>
>
> On Tue, 12 Oct 2010 10:24:53 +0200
> Gregor <mailinglist at gmx.at> wrote:
>
>> Dear all,
>>
>> I am struggling with a (currently) cost-intensive problem: calculating the
>> (non-normalized) cumulative distribution function, given the (non-normalized)
>> probabilities. something like:
>>
>> probs <- t(matrix(rep(1:100),nrow=10)) # matrix with row-wise probabilites
>> F <- t(apply(probs, 1, cumsum)) #SLOOOW!
>>
>> One (already faster, but for sure not ideal) solution - thanks to Henrik Bengtsson:
>>
>> F <- matrix(0, nrow=nrow(probs), ncol=ncol(probs));
>> F[,1] <- probs[,1,drop=TRUE];
>> for (cc in 2:ncol(F)) {
>>   F[,cc] <- F[,cc-1,drop=TRUE] + probs[,cc,drop=TRUE];
>> }
>>
>> In my case, probs is a (30,000 x 10) matrix, and i need to iterate this step around
>> 200,000 times, so speed is crucial. I currently can make sure to have no NAs, but
>> in order to extend matrixStats, this could be a nontrivial issue.
>>
>> Any ideas for speeding up this - probably routine - task?
>>
>> Thanks in advance,
>> Gregor
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/