[Rd] [R] custom sort?

Duncan Murdoch murdoch at stats.uwo.ca
Thu Jun 4 22:58:23 CEST 2009


Stavros Macrakis wrote:
> Thanks for the quick fix!
>   

It was quick in R-devel, not so quick in R-patched.  I forgot to commit 
the change, but someone accidentally ported my NEWS item about it over, 
so I thought I really had done it when I looked from another computer 
the next day.  Then I headed out on the road...

Will commit it to R-patched (from the Vancouver airport) in a few minutes.

Duncan Murdoch
>             -s
>
> On Fri, May 29, 2009 at 1:02 PM, Duncan Murdoch <murdoch at stats.uwo.ca>wrote:
>
>   
>> On 5/29/2009 9:28 AM, Duncan Murdoch wrote:
>>
>>     
>>> I've moved this to R-devel...
>>>
>>> On 5/28/2009 8:17 PM, Stavros Macrakis wrote:
>>>
>>>       
>>>> I couldn't get your suggested method to work:
>>>>
>>>>  `==.foo` <- function(a,b) unclass(a)==unclass(b)
>>>>  `>.foo` <- function(a,b) unclass(a) < unclass(b)     # invert comparison
>>>>  is.na.foo <- function(a)is.na(unclass(a))
>>>>
>>>>  sort(structure(sample(5),class="foo"))  #-> 1:5  -- not reversed
>>>>
>>>> What am I missing?
>>>>
>>>>         
>>> There are two problems.  First, I didn't mention that you need a method
>>> for indexing as well.  The code needs to evaluate things like x[i] > x[j],
>>> and by default x[i] will not be of class "foo", so the custom comparison
>>> methods won't be called.
>>>
>>> Second, I think there's a bug in the internal code, specifically in
>>> do_rank or orderVector1 in sort.c:  orderVector1 ignores the class of x.
>>>  do_rank pays attention when breaking ties, so I think this is an oversight.
>>>
>>> So I'd say two things should be done:
>>>
>>>  1.  the bug should be fixed.  Even if this isn't the most obvious
>>> approach, it should work.
>>>
>>>       
>> I've now fixed the bug, and clarified the documentation to say
>>
>>  The default method will make use of == and > methods
>>  for the class of x[i] (for integers i), and the
>>  is.na method for the class of x, but might be rather
>>  slow when doing so.
>>
>> You don't actually need a custom indexing method, you just need to be aware
>> that it's the class of x[i] that is important for comparisons.
>>
>> This will make it into R-patched and R-devel.
>>
>> Duncan Murdoch
>>
>>
>>
>>     
>>>  2.  we should look for ways to make all of this simpler, e.g. allowing a
>>> comparison function to be used.
>>>
>>> I'll take on 1, but not 2.  It's hard to work out the right place for the
>>> comparison function to appear, and it would require a lot of work to
>>> implement, because all of this stuff (sort, rank, order, xtfrm, sort.int,
>>> etc.) is closely interrelated, some but not all of the functions are S3
>>> generics, some implemented internally, etc.  In the end, I'd guess the
>>> results won't be very satisfactory from a performance point of view:  all
>>> those calls out to R to do the comparisons are going to be really slow.
>>>
>>> I think your advice to use order() with multiple keys is likely to be much
>>> faster in most instances.  It's just a better approach in R.
>>>
>>> Duncan Murdoch
>>>
>>>
>>>       
>>>>           -s
>>>>
>>>> On Thu, May 28, 2009 at 5:48 PM, Duncan Murdoch <murdoch at stats.uwo.ca
>>>>         
>>>>> wrote:
>>>>>           
>>>>  On 28/05/2009 5:34 PM, Steve Jaffe wrote:
>>>>         
>>>>>  Sounds simple but haven't been able to find it in docs: is it possible
>>>>>           
>>>>>> to
>>>>>> sort a vector using a user-defined comparison function? Seems it must
>>>>>> be,
>>>>>> but "sort" doesn't seem to provide that option, nor does "order" sfaics
>>>>>>
>>>>>>
>>>>>>             
>>>>> You put a class on the vector (e.g. using class(x) <- "myvector"), then
>>>>> define a conversion to numeric (e.g. xtfrm.myvector) or actual
>>>>> comparison
>>>>> methods (you'll need ==.myvector, >.myvector, and is.na.myvector).
>>>>>
>>>>> Duncan Murdoch
>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>
>>>>>           
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>         
>>>
>>>       
>
>



More information about the R-devel mailing list