[R] Help with efficient double sum of max (X_i, Y_i) (X & Y vectors)

Ravi Varadhan rvaradhan at jhmi.edu
Fri Feb 2 17:06:17 CET 2007


Thomas, you are absolutely correct.  

Here are some results comparing Jeff's original implementation, my
suggestion using outer and pmax, and your clever trick using "fast.pmax".
Your fast.pmax function is more than 3 times faster than pmax.  Thanks for
this wonderful insight. 

Best,
Ravi.

> x <- rnorm(2000)
> y <- runif(500)
>   nx <- length(x)
>   ny <- length(y)
>   sum1 <- 0
>   sum3 <- 0
>     
> # Here is the straightforward way
> system.time(
+   for(i in 1:nx) {
+     sum1 <- sum1 + sum(ifelse(x[i]>x,x[i],x))
+     sum3 <- sum3 + sum(ifelse(x[i]>y,x[i],y))
+   }
+ )
[1] 3.83 0.00 3.83   NA   NA
> 
> # Here is a faster way using "outer" and "pmax"
> system.time({
+ sum11 <- sum(outer(x,x,FUN="pmax"))
+ sum33 <- sum(outer(x,y,FUN="pmax"))
+ })
[1] 2.55 0.48 3.04   NA   NA
> 
> # Here is an even faster method using Tom Lumley's suggestion:
> fast.pmax <- function(x,y) {i<- x<y; x[i]<-y[i]; x}
> 
> system.time({
+ sum111 <- sum(outer(x,x,FUN="fast.pmax"))
+ sum333 <- sum(outer(x,y,FUN="fast.pmax"))
+ })
[1] 0.78 0.08 0.86   NA   NA
> 
> 
> all.equal(sum1,sum11,sum111)
[1] TRUE
> all.equal(sum3,sum33,sum333)
[1] TRUE
> 
>

----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 

----------------------------------------------------------------------------
--------

-----Original Message-----
From: Thomas Lumley [mailto:tlumley at u.washington.edu] 
Sent: Friday, February 02, 2007 9:06 AM
To: Ravi Varadhan
Cc: racinej at mcmaster.ca; r-help at stat.math.ethz.ch
Subject: Re: [R] Help with efficient double sum of max (X_i, Y_i) (X & Y
vectors)

On Thu, 1 Feb 2007, Ravi Varadhan wrote:

> Jeff,
>
> Here is something which is a little faster:
>
> sum1 <- sum(outer(x, x, FUN="pmax"))
> sum3 <- sum(outer(x, y, FUN="pmax"))

This is the sort of problem where profiling can be useful.  My experience 
with pmax() is that it is surprisingly slow, presumably because it handles 
recycling and NAs

In the example I profiled (an MCMC calculation) it was measurably faster 
to use
    function(x,y) {i<- x<y; x[i]<-y[i]; x}

 	-thomas

>
> Best,
> Ravi.
>
>
----------------------------------------------------------------------------
> -------
>
> Ravi Varadhan, Ph.D.
>
> Assistant Professor, The Center on Aging and Health
>
> Division of Geriatric Medicine and Gerontology
>
> Johns Hopkins University
>
> Ph: (410) 502-2619
>
> Fax: (410) 614-9625
>
> Email: rvaradhan at jhmi.edu
>
> Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
>
>
>
>
----------------------------------------------------------------------------
> --------
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jeffrey Racine
> Sent: Thursday, February 01, 2007 1:18 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Help with efficient double sum of max (X_i, Y_i) (X & Y
> vectors)
>
> Greetings.
>
> For R gurus this may be a no brainer, but I could not find pointers to
> efficient computation of this beast in past help files.
>
> Background - I wish to implement a Cramer-von Mises type test statistic
> which involves double sums of max(X_i,Y_j) where X and Y are vectors of
> differing length.
>
> I am currently using ifelse pointwise in a vector, but have a nagging
> suspicion that there is a more efficient way to do this. Basically, I
> require three sums:
>
> sum1: \sum_i\sum_j max(X_i,X_j)
> sum2: \sum_i\sum_j max(Y_i,Y_j)
> sum3: \sum_i\sum_j max(X_i,Y_j)
>
> Here is my current implementation - any pointers to more efficient
> computation greatly appreciated.
>
>  nx <- length(x)
>  ny <- length(y)
>
>  sum1 <- 0
>  sum3 <- 0
>
>  for(i in 1:nx) {
>    sum1 <- sum1 + sum(ifelse(x[i]>x,x[i],x))
>    sum3 <- sum3 + sum(ifelse(x[i]>y,x[i],y))
>  }
>
>  sum2 <- 0
>  sum4 <- sum3 # symmetric and identical
>
>  for(i in 1:ny) {
>    sum2 <- sum2 + sum(ifelse(y[i]>y,y[i],y))
>  }
>
> Thanks in advance for your help.
>
> -- Jeff
>
> -- 
> Professor J. S. Racine         Phone:  (905) 525 9140 x 23825
> Department of Economics        FAX:    (905) 521-8232
> McMaster University            e-mail: racinej at mcmaster.ca
> 1280 Main St. W.,Hamilton,     URL:
> http://www.economics.mcmaster.ca/racine/
> Ontario, Canada. L8S 4M4
>
> `The generation of random numbers is too important to be left to chance'
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle



More information about the R-help mailing list