[R] Find number of elements less than some number: Elegant/fastsolution needed

William Dunlap wdunlap at tibco.com
Thu Apr 14 23:45:30 CEST 2011


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Kevin Ummel
> Sent: Thursday, April 14, 2011 12:35 PM
> To: r-help at r-project.org
> Subject: [R] Find number of elements less than some number: 
> Elegant/fastsolution needed
> 
> Take vector x and a subset y:
> 
> x=1:10
> 
> y=c(4,5,7,9)
> 
> For each value in 'x', I want to know how many elements in 
> 'y' are less than 'x'.
> 
> An example would be:
> 
> sapply(x,FUN=function(i) {length(which(y<i))})
>  [1] 0 0 0 0 1 2 2 3 3 4
> 
> But this solution is far too slow when x and y have lengths 
> in the millions.
> 
> I'm certain an elegant (and computationally efficient) 
> solution exists, but I'm in the weeds at this point.

If x is sorted findInterval(x, y) would do it for <= but you
want strict <.  Try
  f2 <- function(x, y) length(y) - findInterval(-x, rev(-sort(y)))
where your version is
  f0 <- function(x, y) sapply(x,FUN=function(i) {length(which(y<i))})
E.g.,
  > x <- sort(sample(1e6, size=1e5))
  > y <- sample(1e6, size=1e4, replace=TRUE)
  > system.time(r0 <- f0(x,y))
     user  system elapsed
    7.900   0.310   8.211
  > system.time(r2 <- f2(x,y))
     user  system elapsed
    0.000   0.000   0.004
  > identical(r0, r2)
  [1] TRUE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Any help is much appreciated.
> 
> Kevin
> 
> University of Manchester
> 
> 
> 
> 
> 
> 
> 
> 
> Take two vectors x and y, where y is a subset of x:
> 
> x=1:10
> 
> y=c(2,5,6,9)
> 
> If y is removed from x, the original x values now have a new 
> placement (index) in the resulting vector (new): 
> 
> new=x[-y]
> 
> index=1:length(new)
> 
> The challenge is: How can I *quickly* and *efficiently* 
> deduce the new 'index' value directly from the original 'x' 
> value -- using only 'y' as an input?
> 
> In practice, I have very large matrices containing the 'x' 
> values, and I need to convert them to the corresponding 
> 'index' if the 'y' values are removed.
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list