[Rd] Couldn't (and shouldn't) is.unsorted() be faster?

Herve Pages hpages at fhcrc.org
Fri Apr 18 01:35:17 CEST 2008


Hi,

Thanks for your answers!

No need to change anything. In my case, 'x' is guaranteed to be an integer
vector with no NAs so I can call .Internal(is.unsorted(x)) directly.

BTW, why not make is.unsorted() a little bit more prepared to silly user
input:

   > is.unsorted(c(2:5, NA), na.rm=NA)
   Error in if (!is.atomic(x) || (!na.rm && any(is.na(x)))) return(NA) :
     missing value where TRUE/FALSE needed

(or at least silently coerce those silly values to TRUE or FALSE like
max()/min() do, following some obscure logic though).

Also it's arguable that a length-1 vector cannot be considered sorted:
   > is.unsorted(NA)
   [1] NA

Cheers,
H.


Prof Brian Ripley wrote:
> I wouldn't say 'easy', and so I think we need a business case for this 
> change.  (One of the issues is that the internals are used elsewhere and 
> optimized for inputs without NAs.  So we would need to write separate 
> code if we pass NAs down to C level.  As I recall, is.unsorted was a 
> cheap R interface to existing C code.)
> 
> What real-world problems are being affected by this, and what would be 
> proportional speedup of the whole analysis be from this change?
> 
> On Thu, 17 Apr 2008, Bill Dunlap wrote:
> 
>> On Thu, 17 Apr 2008, Herve Pages wrote:
>>
>>> Couldn't is.unsorted() bail out immediately here (after comparing
>>> the first 2 elements):
>>>
>>> > x <- 20000000:1
>>> > system.time(is.unsorted(x), gcFirst=TRUE)
>>>     user  system elapsed
>>>    0.084   0.040   0.124
>>>
>>> > x <- 200000000:1
>>> > system.time(is.unsorted(x), gcFirst=TRUE)
>>>     user  system elapsed
>>>    0.772   0.440   1.214
>>
>> The C code does bail out upon seeing the first out- of-order pair, but
>> before calling the C code, the S code does any(is.na(x)), forcing a
>> scan of the entire data.  If you remove the is.na calls from
>> is.unsorted's S code you will see the timings improve in your example.
>> (It looks easy to do the NA checks in the C code.)
>>
>>   is.unsorted.no.nacheck <- function (x, na.rm = FALSE) {
>>       if (is.null(x))
>>           return(FALSE)
>>       if (!is.atomic(x))
>>           return(NA)
>>       .Internal(is.unsorted(x))
>>   }
>>   > x <- 20000000:1
>>   > system.time(is.unsorted(x), gcFirst=TRUE)
>>   user  system elapsed
>>     0.356   0.157   0.514
>>   > system.time(is.unsorted.no.nacheck(x), gcFirst=TRUE)
>>   user  system elapsed
>>      0       0       0
>>   > revx <- rev(x)
>>   > system.time(is.unsorted(revx), gcFirst=TRUE)
>>      user  system elapsed
>>     0.500   0.170   0.672
>>   > system.time(is.unsorted.no.nacheck(revx),gcFirst=TRUE)
>>      user  system elapsed
>>     0.131   0.000   0.132
>>
>> ---------------------------------------------------------------------------- 
>>
>> Bill Dunlap
>> Insightful Corporation
>> bill at insightful dot com
>> 360-428-8146
>>
>> "All statements in this message represent the opinions of the author 
>> and do
>> not necessarily reflect Insightful Corporation policy or position."
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>



More information about the R-devel mailing list