[Rd] Expected behaviour of is.unsorted?

Duncan Murdoch murdoch.duncan at gmail.com
Thu May 24 20:42:52 CEST 2012


On 24/05/2012 1:33 PM, Matthew Dowle wrote:
> >  On 24/05/2012 11:10 AM, Matthew Dowle wrote:
> >>  >   On 24/05/2012 9:15 AM, Matthew Dowle wrote:
> >>  >>   Duncan Murdoch<murdoch.duncan<at>    gmail.com>    writes:
> >>  >>   >
> >>  >>   >    On 12-05-24 7:39 AM, Matthew Dowle wrote:
> >>  >>   >    >    Duncan Murdoch<murdoch.duncan<at>     gmail.com>     writes:
> >>  >>   >    >>
> >>  >>   >    >>    On 12-05-23 4:37 AM, Matthew Dowle wrote:
> >>  >>   >    >    Since it seems to have a bug anyway (and if so, can't be
> >>  correct
> >>  >>   in anyone's
> >>  >>   >    >    use of it), could either is.unsorted on a data.frame return
> >>  the
> >>  >>   error
> >>  >>   that's in
> >>  >>   >    >    the C code already: "only atomic vectors can be tested to be
> >>  >>   sorted", for
> >>  >>   >    >    safety and to lessen confusion, or be changed to return the
> >>  >>   natural
> >>  >>   expectation
> >>  >>   >    >    proposed above? The easiest quick fix would be to negate the
> >>  >>   result of
> >>  >>   the .gtn
> >>  >>   >    >    call of course, but then you could never go back.
> >>  >>   >
> >>  >>   >    I don't follow the last sentence.  If the .gtn call needs to be
> >>  >>   negated,
> >>  >>   >    why would you want to go back?
> >>  >>
> >>  >>   Because then is.unsorted(DF) would work, but go by row, which you
> >>  >>   guessed above
> >>  >>   wasn't intended and isn't sensible. But once it worked in that way,
> >>  >>   users might
> >>  >>   start to depend on it; e.g., by writing is.unsorted(t(DF)). If I
> >>  came
> >>  >>   along in future and suggested that was inefficient and wouldn't it
> >>  be
> >>  >>   more
> >>  >>   natural and efficient if is.unsorted(DF) went by column, returning
> >>  the
> >>  >>   same as
> >>  >>   with(DF,is.unsorted(order(a,b))) but implemented efficiently, you
> >>  would
> >>  >>   fear
> >>  >>   that user code now depended on it going by row and say it was too
> >>  late.
> >>  >>   I'd
> >>  >>   persist and highlight that it didn't seem in keeping with the spirit
> >>  of
> >>  >>   is.unsorted()'s speed since it short circuits on the first unsorted
> >>  >>   item, which
> >>  >>   is why we love it. You'd reply that's not documented. Which it
> >>  isn't.
> >>  >>   And that
> >>  >>   would be the end of that.
> >>  >
> >>  >   Okay, I'm going to fix the handling of .gtn results, and document the
> >>  >   unsuitability of this
> >>  >   function for dataframes and arrays.
> >>
> >>  But that leaves the door open to confusion later, whilst closing the
> >>  door
> >>  to a better solution: making is.unsorted() work by column for
> >>  data.frame;
> >>  i.e., making is.unsorted _suitable_ for data.frame. If you just do the
> >>  quick fix for .gtn result you can never go back. If making
> >>  is.unsorted(DF)
> >>  work by column is too hard for now, then leaving the door open would be
> >>  better by returning the error message already in the C code: "only
> >>  atomic
> >>  vectors can be tested to be sorted". That would be a better quick fix
> >>  since it leaves options for the future.
> >
> >  I don't see why saying this function is unsuitable for dataframes
> >  implies that it will never be made suitable for dataframes.
>
> If user code or packages start to depend on is.unsorted(t(DF)) it would be
> harder to change, no?

I don't see why.  t(DF) is not a dataframe, so it will give surprising 
answers in a different way.  If people rely on using code in ways that 
are documented to give unexpected results, they deserve what they get.
>   Why provide something that is unsuitable and allow
> that possibility to happen? It's more user friendly to return "not
> implemented", "unsuitable", or the nicer message already in the C code,
> than leave the door open for confusion and errors. Or in other words, it's
> even more user friendly to return a warning or error to the user at the
> prompt, than the user friendliness of writing in the help file that it's
> unsuitable for data.frame.

I disagree.  I think it is most friendly to implement the function in 
the way it has been documented (even if it hasn't always been behaving 
as documented).

Duncan Murdoch



More information about the R-devel mailing list