[Rd] Expected behaviour of is.unsorted?

Matthew Dowle mdowle at mdowle.plus.com
Thu May 24 13:39:10 CEST 2012


Duncan Murdoch <murdoch.duncan <at> gmail.com> writes:
> 
> On 12-05-23 4:37 AM, Matthew Dowle wrote:
> >
> > Hi,
> >
> > I've read ?is.unsorted and searched. Have found a few items but nothing
> > close, yet. Is the following expected?
> >
> >> is.unsorted(data.frame(1:2))
> > [1] FALSE
> >> is.unsorted(data.frame(2:1))
> > [1] FALSE
> >> is.unsorted(data.frame(1:2,3:4))
> > [1] TRUE
> >> is.unsorted(data.frame(2:1,4:3))
> > [1] TRUE
> >
> > IIUC, is.unsorted is intended for atomic vectors only (description of x in
> > ?is.unsorted). Indeed the C source (src/main/sort.c) contains an error
> > message "only atomic vectors can be tested to be sorted". So that is the
> > error message I expected to see in all cases above, since I know that
> > data.frame is not an atomic vector. But there is also this in
> > ?is.unsorted: "except for atomic vectors and objects with a class (where
> > the>= or>  method is used)" which I don't understand. Where>= or>  is
> > used by what, and where?
> 
> If you look at the source, you will see that the basic test for classed 
> objects is
> 
> all(x[-1L] >= x[-length(x)])
> 
> (in the function base:::.gtn).
> 
> This comparison doesn't really makes sense for dataframes, but it does 
> seem to be backwards:  that tests that x[2] >= x[1], x[3] >= x[2], etc., 
> returning TRUE if all comparisons are TRUE:  but that sounds like it 
> should be is.sorted(), not is.unsorted().  Or is it my brain that is 
> backwards?

Thanks. Yes you're right. So is.unsorted() on a data.frame is trying to tell us 
if there exists any unsorted row, it seems.

> DF = data.frame(a=c(1,3,5),b=c(1,3,5))
> DF
  a b
1 1 1               # this row is sorted
2 3 3               # this row is sorted
3 5 5               # this row is sorted
> is.unsorted(DF)   # going by row but should be !.gtn
[1] TRUE
> with(DF,is.unsorted(order(a,b)))  # most people's natural expectation I guess
[1] FALSE
> DF[2,2]=2
> DF
  a b
1 1 1               # this row is sorted
2 3 2               # this row isn't sorted
3 5 5               # this row is sorted
> is.unsorted(DF)   # going by row but should be !.gtn
[1] FALSE
> with(DF,is.unsorted(order(a,b)))  # most people's natural expectation I guess
[1] FALSE

Since it seems to have a bug anyway (and if so, can't be correct in anyone's 
use of it), could either is.unsorted on a data.frame return the error that's in 
the C code already: "only atomic vectors can be tested to be sorted", for 
safety and to lessen confusion, or be changed to return the natural expectation 
proposed above? The easiest quick fix would be to negate the result of the .gtn 
call of course, but then you could never go back.

Matthew

> Duncan Murdoch
> 
> >
> > I understand why the first two are FALSE (1 item of anything must be
> > sorted). I don't understand the 3rd and 4th cases where length is 2:
> > do_isunsorted seems to call lang3(install(".gtn"), x, CADR(args))). Does
> > that fall back to TRUE for some reason?
> >
> > Matthew
> >
> >> sessionInfo()
> > R version 2.15.0 (2012-03-30)
> > Platform: x86_64-pc-mingw32/x64 (64-bit)
> >
> > locale:
> > [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
> > Kingdom.1252
> > [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
> > [5] LC_TIME=English_United Kingdom.1252
> >
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets  methods   base
> >
> > other attached packages:
> > [1] data.table_1.8.0
> >
> > loaded via a namespace (and not attached):
> > [1] tools_2.15.0
> >
> > ______________________________________________
> > R-devel <at> r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>



More information about the R-devel mailing list