[Rd] Expected behaviour of is.unsorted?

Thu May 24 19:33:04 CEST 2012

> On 24/05/2012 11:10 AM, Matthew Dowle wrote:
>> >  On 24/05/2012 9:15 AM, Matthew Dowle wrote:
>> >>  Duncan Murdoch<murdoch.duncan<at>   gmail.com>   writes:
>> >>  >
>> >>  >   On 12-05-24 7:39 AM, Matthew Dowle wrote:
>> >>  >   >   Duncan Murdoch<murdoch.duncan<at>    gmail.com>    writes:
>> >>  >   >>
>> >>  >   >>   On 12-05-23 4:37 AM, Matthew Dowle wrote:
>> >>  >   >   Since it seems to have a bug anyway (and if so, can't be
>> correct
>> >>  in anyone's
>> >>  >   >   use of it), could either is.unsorted on a data.frame return
>> the
>> >>  error
>> >>  that's in
>> >>  >   >   the C code already: "only atomic vectors can be tested to be
>> >>  sorted", for
>> >>  >   >   safety and to lessen confusion, or be changed to return the
>> >>  natural
>> >>  expectation
>> >>  >   >   proposed above? The easiest quick fix would be to negate the
>> >>  result of
>> >>  the .gtn
>> >>  >   >   call of course, but then you could never go back.
>> >>  >
>> >>  >   I don't follow the last sentence.  If the .gtn call needs to be
>> >>  negated,
>> >>  >   why would you want to go back?
>> >>
>> >>  Because then is.unsorted(DF) would work, but go by row, which you
>> >>  guessed above
>> >>  wasn't intended and isn't sensible. But once it worked in that way,
>> >>  users might
>> >>  start to depend on it; e.g., by writing is.unsorted(t(DF)). If I
>> came
>> >>  along in future and suggested that was inefficient and wouldn't it
>> be
>> >>  more
>> >>  natural and efficient if is.unsorted(DF) went by column, returning
>> the
>> >>  same as
>> >>  with(DF,is.unsorted(order(a,b))) but implemented efficiently, you
>> would
>> >>  fear
>> >>  that user code now depended on it going by row and say it was too
>> late.
>> >>  I'd
>> >>  persist and highlight that it didn't seem in keeping with the spirit
>> of
>> >>  is.unsorted()'s speed since it short circuits on the first unsorted
>> >>  item, which
>> >>  is why we love it. You'd reply that's not documented. Which it
>> isn't.
>> >>  And that
>> >>  would be the end of that.
>> >
>> >  Okay, I'm going to fix the handling of .gtn results, and document the
>> >  unsuitability of this
>> >  function for dataframes and arrays.
>>
>> But that leaves the door open to confusion later, whilst closing the
>> door
>> to a better solution: making is.unsorted() work by column for
>> data.frame;
>> i.e., making is.unsorted _suitable_ for data.frame. If you just do the
>> quick fix for .gtn result you can never go back. If making
>> is.unsorted(DF)
>> work by column is too hard for now, then leaving the door open would be
>> better by returning the error message already in the C code: "only
>> atomic
>> vectors can be tested to be sorted". That would be a better quick fix
>> since it leaves options for the future.
>
> I don't see why saying this function is unsuitable for dataframes
> implies that it will never be made suitable for dataframes.

If user code or packages start to depend on is.unsorted(t(DF)) it would be
harder to change, no? Why provide something that is unsuitable and allow
that possibility to happen? It's more user friendly to return "not
implemented", "unsuitable", or the nicer message already in the C code,
than leave the door open for confusion and errors. Or in other words, it's
even more user friendly to return a warning or error to the user at the
prompt, than the user friendliness of writing in the help file that it's
unsuitable for data.frame.

Matthew