[R] speed of a vector operation question

Martin Morgan mtmorgan at fhcrc.org
Fri Apr 26 22:32:54 CEST 2013


A very similar question was asked on StackOverflow (by Mikhail? and then I guess 
the answers there were somehow not satisfactory...)

 
http://stackoverflow.com/questions/16213029/more-efficient-strategy-for-which-or-match

where it turns out that a binary search (implemented in R) on the sorted vector 
is much faster than sum, etc. I guess because it's log N without copying. The 
more complicated condition x > .3 & x < .5 could be satisfied with multiple 
calls to the search.

Martin

On 04/26/2013 01:20 PM, William Dunlap wrote:
>
>> I think the sum way is the best.
>
> On my Linux machine running R-3.0.0 the sum way is slightly faster:
>    > x <- rexp(1e6, 2)
>    > system.time(for(i in 1:100)sum(x>.3 & x<.5))
>       user  system elapsed
>      4.664   0.340   5.018
>    > system.time(for(i in 1:100)length(which(x>.3 & x<.5)))
>       user  system elapsed
>      5.017   0.160   5.186
>
> If you are doing many of these counts on the same dataset you
> can save time by using functions like cut(), table(), ecdf(), and
> findInterval().  E.g.,
>> system.time(r1 <- vapply(seq(0,1,by=1/128)[-1], function(i)sum(x>(i-1/128) & x<=i), FUN.VALUE=0L))
>     user  system elapsed
>    5.332   0.568   5.909
>> system.time(r2 <- table(cut(x, seq(0,1,by=1/128))))
>     user  system elapsed
>    0.500   0.008   0.511
>> all.equal(as.vector(r1), as.vector(r2))
> [1] TRUE
>
> You should do the timings yourself, as the relative speeds will depend
> on the version or dialect of  the R interpreter and how it was compiled.
> E.g., with the current development version of 'TIBCO Enterprise Runtime for R' (aka 'TERR')
> on this same 8-core Linux box the sum way is considerably faster then
> the length(which) way:
>    > x <- rexp(1e6, 2)
>    > system.time(for(i in 1:100)sum(x>.3 & x<.5))
>       user  system elapsed
>       1.87    0.03    0.48
>    > system.time(for(i in 1:100)length(which(x>.3 & x<.5)))
>       user  system elapsed
>       3.21    0.04    0.83
>    > system.time(r1 <- vapply(seq(0,1,by=1/128)[-1], function(i)sum(x>(i-1/128) & x<=i), FUN.VALUE=0L))
>       user  system elapsed
>       2.19    0.04    0.56
>    > system.time(r2 <- table(cut(x, seq(0,1,by=1/128))))
>       user  system elapsed
>       0.27    0.01    0.13
>    > all.equal(as.vector(r1), as.vector(r2))
>    [1] TRUE
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
>> Of lcn
>> Sent: Friday, April 26, 2013 12:09 PM
>> To: Mikhail Umorin
>> Cc: r-help at r-project.org
>> Subject: Re: [R] speed of a vector operation question
>>
>> I think the sum way is the best.
>>
>>
>> On Fri, Apr 26, 2013 at 9:12 AM, Mikhail Umorin <mikeumo at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am dealing with numeric vectors 10^5 to 10^6 elements long. The values
>>> are
>>> sorted (with duplicates) in the vector (v). I am obtaining the length of
>>> vectors such as (v < c) or (v > c1 & v < c2), where c, c1, c2 are some
>>> scalar
>>> variables. What is the most efficient way to do this?
>>>
>>> I am using sum(v < c) since TRUE's are 1's and FALSE's are 0's. This seems
>>> to
>>> me more efficient than length(which(v < c)), but, please, correct me if I'm
>>> wrong. So, is there anything faster than what I already use?
>>>
>>> I'm running R 2.14.2 on Linux kernel 3.4.34.
>>>
>>> I appreciate your time,
>>>
>>> Mikhail
>>>          [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the R-help mailing list