[Rd] Expected behaviour of is.unsorted?

Thu May 24 20:42:52 CEST 2012

On 24/05/2012 1:33 PM, Matthew Dowle wrote:
> >  On 24/05/2012 11:10 AM, Matthew Dowle wrote:
> >>  >   On 24/05/2012 9:15 AM, Matthew Dowle wrote:
> >>  >>   Duncan Murdoch<murdoch.duncan<at>    gmail.com>    writes:
> >>  >>   >
> >>  >>   >    On 12-05-24 7:39 AM, Matthew Dowle wrote:
> >>  >>   >    >    Duncan Murdoch<murdoch.duncan<at>     gmail.com>     writes:
> >>  >>   >    >>
> >>  >>   >    >>    On 12-05-23 4:37 AM, Matthew Dowle wrote:
> >>  >>   >    >    Since it seems to have a bug anyway (and if so, can't be
> >>  correct
> >>  >>   in anyone's
> >>  >>   >    >    use of it), could either is.unsorted on a data.frame return
> >>  the
> >>  >>   error
> >>  >>   that's in
> >>  >>   >    >    the C code already: "only atomic vectors can be tested to be
> >>  >>   sorted", for
> >>  >>   >    >    safety and to lessen confusion, or be changed to return the
> >>  >>   natural
> >>  >>   expectation
> >>  >>   >    >    proposed above? The easiest quick fix would be to negate the
> >>  >>   result of
> >>  >>   the .gtn
> >>  >>   >    >    call of course, but then you could never go back.
> >>  >>   >
> >>  >>   >    I don't follow the last sentence.  If the .gtn call needs to be
> >>  >>   negated,
> >>  >>   >    why would you want to go back?
> >>  >>
> >>  >>   Because then is.unsorted(DF) would work, but go by row, which you
> >>  >>   guessed above
> >>  >>   wasn't intended and isn't sensible. But once it worked in that way,
> >>  >>   users might
> >>  >>   start to depend on it; e.g., by writing is.unsorted(t(DF)). If I
> >>  came
> >>  >>   along in future and suggested that was inefficient and wouldn't it
> >>  be
> >>  >>   more
> >>  >>   natural and efficient if is.unsorted(DF) went by column, returning
> >>  the
> >>  >>   same as
> >>  >>   with(DF,is.unsorted(order(a,b))) but implemented efficiently, you
> >>  would
> >>  >>   fear
> >>  >>   that user code now depended on it going by row and say it was too
> >>  late.
> >>  >>   I'd
> >>  >>   persist and highlight that it didn't seem in keeping with the spirit
> >>  of
> >>  >>   is.unsorted()'s speed since it short circuits on the first unsorted
> >>  >>   item, which
> >>  >>   is why we love it. You'd reply that's not documented. Which it
> >>  isn't.
> >>  >>   And that
> >>  >>   would be the end of that.
> >>  >
> >>  >   Okay, I'm going to fix the handling of .gtn results, and document the
> >>  >   unsuitability of this
> >>  >   function for dataframes and arrays.
> >>
> >>  But that leaves the door open to confusion later, whilst closing the
> >>  door
> >>  to a better solution: making is.unsorted() work by column for
> >>  data.frame;
> >>  i.e., making is.unsorted _suitable_ for data.frame. If you just do the
> >>  quick fix for .gtn result you can never go back. If making
> >>  is.unsorted(DF)
> >>  work by column is too hard for now, then leaving the door open would be
> >>  better by returning the error message already in the C code: "only
> >>  atomic
> >>  vectors can be tested to be sorted". That would be a better quick fix
> >>  since it leaves options for the future.
> >
> >  I don't see why saying this function is unsuitable for dataframes
> >  implies that it will never be made suitable for dataframes.
>
> If user code or packages start to depend on is.unsorted(t(DF)) it would be
> harder to change, no?

I don't see why.  t(DF) is not a dataframe, so it will give surprising 
answers in a different way.  If people rely on using code in ways that 
are documented to give unexpected results, they deserve what they get.
>   Why provide something that is unsuitable and allow
> that possibility to happen? It's more user friendly to return "not
> implemented", "unsuitable", or the nicer message already in the C code,
> than leave the door open for confusion and errors. Or in other words, it's
> even more user friendly to return a warning or error to the user at the
> prompt, than the user friendliness of writing in the help file that it's
> unsuitable for data.frame.

I disagree.  I think it is most friendly to implement the function in 
the way it has been documented (even if it hasn't always been behaving 
as documented).

Duncan Murdoch