[R] effective way to return only the first argument of "which()"

Milan Bouchet-Valat nalimilan at club.fr
Wed Sep 19 17:55:11 CEST 2012


Le mercredi 19 septembre 2012 à 15:23 +0000, William Dunlap a écrit :
> The original method is faster than which.max for longish numeric vectors
> (in R-2.15.1), but you should check time and memory usage on your
> own machine:
> 
> > x <- runif(18e6)
> > system.time(for(i in 1:100)which(x>0.99)[1])
>    user  system elapsed 
>   11.64    1.05   12.70 
> > system.time(for(i in 1:100)which.max(x>0.99))
>    user  system elapsed 
>   16.38    2.94   19.35
If you the probability that such an element appears at the beginning of
the vector, a custom hack might well be more efficient. The problem with
">", which() and which.max() is that they will go over all the elements
of the vector even if it's not needed at all. So you can start with a
small subset of the vector, and increase its size in a few steps until
you find the value you're looking for.

Proof of concept (the values of n obviously need to be adapted):
x <-runif(1e7)

find <- function(x, lim) {
    len <- length(x)

    for(n in 2^(14:0)) {
        val <- which(x[seq.int(1L, len/n)] > lim)

        if(length(val) > 0) return(val[1])
    }

    return(NULL)
}

> system.time(for(i in 1:100)which(x>0.999)[1])
utilisateur     système      écoulé 
      9.740       5.795      15.890 
> system.time(for(i in 1:100)which.max(x>0.999))
utilisateur     système      écoulé 
     14.288       9.510      24.562 
> system.time(for(i in 1:100)find(x, .999))
utilisateur     système      écoulé 
      0.017       0.002       0.019 
> find(x, .999)
[1] 1376

(Looks almost like cheating... ;-)





> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> > Of Jeff Newmiller
> > Sent: Wednesday, September 19, 2012 8:06 AM
> > To: Mike Spam; r-help at r-project.org
> > Subject: Re: [R] effective way to return only the first argument of "which()"
> > 
> > ?which.max
> > ---------------------------------------------------------------------------
> > Jeff Newmiller                        The     .....       .....  Go Live...
> > DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
> >                                       Live:   OO#.. Dead: OO#..  Playing
> > Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
> > /Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
> > ---------------------------------------------------------------------------
> > Sent from my phone. Please excuse my brevity.
> > 
> > Mike Spam <ichmagspam at googlemail.com> wrote:
> > 
> > >Hi,
> > >
> > >I was looking for a function like "which()" but only returns the first
> > >argument.
> > >Compare:
> > >
> > >x <- c(1,2,3,4,5,6)
> > >y <- 4
> > >which(x>y)
> > >
> > >returns:
> > >5,6
> > >
> > >which(x>y)[1]
> > >returns:
> > >5
> > >
> > >which(x>y)[1] is exactly what i need. I did use this but the dataset
> > >is too big (~18 mio. Points).
> > >That's why i need a more effective way to get the first element of a
> > >vector which is bigger/smaller than a specific number.
> > >
> > >I found "match()" but this function only works for equal numbers.
> > >
> > >
> > >
> > >Thanks,
> > >Nico
> > >
> > >______________________________________________
> > >R-help at r-project.org mailing list
> > >https://stat.ethz.ch/mailman/listinfo/r-help
> > >PLEASE do read the posting guide
> > >http://www.R-project.org/posting-guide.html
> > >and provide commented, minimal, self-contained, reproducible code.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list