[Rd] Suggestion: Help users sort data frames

kwright@eskimo.com kwright at eskimo.com
Fri Jul 8 21:15:15 CEST 2005


I've noticed that a frequently asked question on R-help is how to sort a
data frame by multiple columns.  Since this question is asked so often,
making this task easier for users seems a worthwhile goal.

At a minimum, the following changes to the documentation would surely be
helpful and reduce email postings to R-help:

1. In the "See Also" section of the help page for 'sort' it currently says:
"order for sorting on or reordering multiple variables".
Appending the phrase "(including data frames)" could be helpful.

2. Include a simple example on sorting a data frame in the 'sort' help. 
In an ideal world this example should belong in the help for 'order', but
people are going to be reading the 'sort' page first.  Here's one example:

# Sort a data frame by multiple columns
d = data.frame(b=factor(c("Hi","Med","Hi","Low"),levels=c("Low","Med","Hi"),
               ordered=TRUE),
               x=c("A","D","A","C"),y=c(8,3,9,9),z=c(1,1,1,2))
d[order(d$b,d$z,d$y),]

3. It looks like the help page for 'order' only shows sorting of matrices
(and row-wise sorting at that!).  Since column-wise sorting is may be more
common, the example above might beneficially be included in the 'order'
help page as well.

A minor bug-report here: The help for 'order' includes this example:
  ## For character vectors we can make use of rank:
  cy <- as.character(y)
  rbind(x,y,z)[, order(x, -rank(y), z)]
I'm not sure what the intention was, but 'cy' is not used, so something
seems to be amis.



Alternatively, I wrote a formula-based function for sorting data frames
via calls like: sort.data.frame(~ -x +y +z, dat)
The function can be found here:
  http://tolstoy.newcastle.edu.au/R/help/04/09/4300.html
When emails to R-help ask how to sort data frames, responses often
reference this function so it seems to have been very helpful for people. 
I showed the function to Thomas Lumley and he said it looked nice but that
using a minus sign for reverse sorting is a bit anti-R flavored since a
minus sign usually means to omit terms from a formula.


Sincerely,

Kevin Wright



More information about the R-devel mailing list