[Rd] vector labels are not permuted properly in a call to sort() (R 2.1)

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 5 16:05:12 CEST 2005


The main problem is that R is inconsistent here.  There are lots of 
branches through the sort() code.  Greg showed one.  Here are four more

> sort(y, method="quick")
   [,1] [,2]
A    1    5
B    2    6
C    3    7
D    4    8
> names(y) <- letters[1:8]
> sort(y)
h g f e d c b a
1 2 3 4 5 6 7 8
> sort(y, method="quick")
   [,1] [,2]
A    1    5
B    2    6
C    3    7
D    4    8
attr(,"names")
[1] "h" "g" "f" "e" "d" "c" "b" "a"
> sort(y, partial=4)
   [,1] [,2]
A    1    5
B    2    6
C    3    7
D    4    8
attr(,"names")
[1] "a" "b" "c" "d" "e" "f" "g" "h"

I believe Svr4 does keep names but does not allow names on matrices.

There are other problems: should sorting a time-series preserve the ts 
properties (probably not, but it does).  Should (S3 or S4) class 
information be preserved (it seems inappropriate for a time series, for 
example)?

The course of least resistance here is to always preserve attributes and 
to document that we do so.  Probably the most S-compliant solution is to 
preserve only names (and sort them as now).

David James quotes the Blue Book, but note that S itself no longer follows 
the principle stated there.


On Wed, 5 Oct 2005, Martin Maechler wrote:

>>>>>> "AndyL" == Liaw, Andy <andy_liaw at merck.com>
>>>>>>     on Tue, 4 Oct 2005 13:51:11 -0400 writes:
>
>    AndyL> The `problem' is that sort() does not doing anything special when given
>    AndyL> a matrix: it only treat it as a vector.  After sorting, it copies
>    AndyL> attributes of the original input to the output.  Since dimnames are
>    AndyL> attributes, they get copied as is.
>
> exactly. Thanks Andy.
>
> And I think users would want this (copying of attributes) in
> many cases; in particular for user-created attributes
>
> ?sort  really talks about sorting of vectors and factors;
>       and it doesn't mention attributes explicitly at all
>       {which should probably be improved}.
>
> One could wonder if R should keep the dim & dimnames
> attributes for arrays and matrices.
> S-plus (6.2) simply drops them {returning a bare unnames vector}
> and that seems pretty reasonable to me.
>
> At least the user would never make the wrong assumptions that
> Greg made about ``matrix sorting''.
>
>
>    AndyL> Try:
>
>    >> y <- matrix(8:1, 4, 2, dimnames=list(LETTERS[1:4], NULL))
>    >> y
>    AndyL> [,1] [,2]
>    AndyL> A    8    4
>    AndyL> B    7    3
>    AndyL> C    6    2
>    AndyL> D    5    1
>    >> sort(y)
>    AndyL> [,1] [,2]
>    AndyL> A    1    5
>    AndyL> B    2    6
>    AndyL> C    3    7
>    AndyL> D    4    8
>
>    AndyL> Notice the row names stay the same.  I'd argue that this is the correct
>    AndyL> behavior.
>
>    AndyL> Andy
>
>
>    >> From: Greg Finak
>    >>
>    >> Not sure if this is the correct forum for this,
>
> yes, R-devel is the proper forum.
> {also since this is really a proposal for a change in R ...}
>
>    >> but I've found what I
>    >> would consider to be a potentially serious bug to the
>    >> unsuspecting user.
>    >> Given a numeric vector V with class labels in R,  the following calls
>    >>
>    >> 1.
>    >> > sort(as.matrix(V))
>    >>
>    >> and
>    >>
>    >> 2.
>    >> >as.matrix(sort(V))
>    >>
>    >> produce different ouput. The vector is sorted properly in
>    >> both cases,
>    >> but only 2. produces the correct labeling of the vector. The call to
>    >> 1. produces a vector with incorrect labels (not sorted).
>    >>
>    >> Code:
>    >> >X<-c("A","B","C","D","E","F","G","H")
>    >> >Y<-rev(1:8)
>    >> >names(Y)<-X
>    >> > Y
>    >> A B C D E F G H
>    >> 8 7 6 5 4 3 2 1
>    >> > sort(as.matrix(Y))
>    >> [,1]
>    >> A    1
>    >> B    2
>    >> C    3
>    >> D    4
>    >> E    5
>    >> F    6
>    >> G    7
>    >> H    8
>    >> > as.matrix(sort(Y))
>    >> [,1]
>    >> H    1
>    >> G    2
>    >> F    3
>    >> E    4
>    >> D    5
>    >> C    6
>    >> B    7
>    >> A    8
>    >>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list