[Rd] [R] custom sort?

Duncan Murdoch murdoch at stats.uwo.ca
Fri May 29 15:28:37 CEST 2009


I've moved this to R-devel...

On 5/28/2009 8:17 PM, Stavros Macrakis wrote:
> I couldn't get your suggested method to work:
> 
>   `==.foo` <- function(a,b) unclass(a)==unclass(b)
>   `>.foo` <- function(a,b) unclass(a) < unclass(b)     # invert comparison
>   is.na.foo <- function(a)is.na(unclass(a))
> 
>   sort(structure(sample(5),class="foo"))  #-> 1:5  -- not reversed
> 
> What am I missing?

There are two problems.  First, I didn't mention that you need a method 
for indexing as well.  The code needs to evaluate things like x[i] > 
x[j], and by default x[i] will not be of class "foo", so the custom 
comparison methods won't be called.

Second, I think there's a bug in the internal code, specifically in 
do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x. 
  do_rank pays attention when breaking ties, so I think this is an 
oversight.

So I'd say two things should be done:

  1.  the bug should be fixed.  Even if this isn't the most obvious 
approach, it should work.

  2.  we should look for ways to make all of this simpler, e.g. allowing 
a comparison function to be used.

I'll take on 1, but not 2.  It's hard to work out the right place for 
the comparison function to appear, and it would require a lot of work to 
implement, because all of this stuff (sort, rank, order, xtfrm, 
sort.int, etc.) is closely interrelated, some but not all of the 
functions are S3 generics, some implemented internally, etc.  In the 
end, I'd guess the results won't be very satisfactory from a performance 
point of view:  all those calls out to R to do the comparisons are going 
to be really slow.

I think your advice to use order() with multiple keys is likely to be 
much faster in most instances.  It's just a better approach in R.

Duncan Murdoch

> 
>            -s
> 
> On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch <murdoch at stats.uwo.ca>wrote:
> 
>> On 28/05/2009 5:34 PM, Steve Jaffe wrote:
>>
>>> Sounds simple but haven't been able to find it in docs: is it possible to
>>> sort a vector using a user-defined comparison function? Seems it must be,
>>> but "sort" doesn't seem to provide that option, nor does "order" sfaics
>>>
>>
>> You put a class on the vector (e.g. using class(x) <- "myvector"), then
>> define a conversion to numeric (e.g. xtfrm.myvector) or actual comparison
>> methods (you'll need ==.myvector, >.myvector, and is.na.myvector).
>>
>> Duncan Murdoch
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-devel mailing list