[Rd] Couldn't (and shouldn't) is.unsorted() be faster?

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Apr 17 22:02:14 CEST 2008


I wouldn't say 'easy', and so I think we need a business case for this 
change.  (One of the issues is that the internals are used elsewhere and 
optimized for inputs without NAs.  So we would need to write separate code 
if we pass NAs down to C level.  As I recall, is.unsorted was a cheap R 
interface to existing C code.)

What real-world problems are being affected by this, and what would be 
proportional speedup of the whole analysis be from this change?

On Thu, 17 Apr 2008, Bill Dunlap wrote:

> On Thu, 17 Apr 2008, Herve Pages wrote:
>
>> Couldn't is.unsorted() bail out immediately here (after comparing
>> the first 2 elements):
>>
>> > x <- 20000000:1
>> > system.time(is.unsorted(x), gcFirst=TRUE)
>>     user  system elapsed
>>    0.084   0.040   0.124
>>
>> > x <- 200000000:1
>> > system.time(is.unsorted(x), gcFirst=TRUE)
>>     user  system elapsed
>>    0.772   0.440   1.214
>
> The C code does bail out upon seeing the first out- of-order pair, but
> before calling the C code, the S code does any(is.na(x)), forcing a
> scan of the entire data.  If you remove the is.na calls from
> is.unsorted's S code you will see the timings improve in your example.
> (It looks easy to do the NA checks in the C code.)
>
>   is.unsorted.no.nacheck <- function (x, na.rm = FALSE) {
>       if (is.null(x))
>           return(FALSE)
>       if (!is.atomic(x))
>           return(NA)
>       .Internal(is.unsorted(x))
>   }
>   > x <- 20000000:1
>   > system.time(is.unsorted(x), gcFirst=TRUE)
>   user  system elapsed
>     0.356   0.157   0.514
>   > system.time(is.unsorted.no.nacheck(x), gcFirst=TRUE)
>   user  system elapsed
>      0       0       0
>   > revx <- rev(x)
>   > system.time(is.unsorted(revx), gcFirst=TRUE)
>      user  system elapsed
>     0.500   0.170   0.672
>   > system.time(is.unsorted.no.nacheck(revx),gcFirst=TRUE)
>      user  system elapsed
>     0.131   0.000   0.132
>
> ----------------------------------------------------------------------------
> Bill Dunlap
> Insightful Corporation
> bill at insightful dot com
> 360-428-8146
>
> "All statements in this message represent the opinions of the author and do
> not necessarily reflect Insightful Corporation policy or position."
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list