[Rd] bug in rank(), order(), is.unsorted() on character vector

Hervé Pagès hpages at fhcrc.org
Thu Dec 8 19:26:48 CET 2011


Hi Barry,

Hope you don't mind if I put this back on the list.

On 11-12-08 05:50 AM, Barry Rowlingson wrote:
> 2011/12/8 Hervé Pagès<hpages at fhcrc.org>:
>
>> A naive question: wouldn't everything be simpler if LC_COLLATE=C
>> was the default for everybody?
>
>   Yet when we Brits suggest everything would be simpler if the whole
> world spoke the Queen's English it causes all sorts of trouble...

:-) Sure I see your point.

But it's a programming language here, used by a lot of researchers.
And having the result of an analysis depend on a crazy collate is
causing all sorts of troubles too.

Note that trying to strike back the Empire is a lost battle anyway.
When you use R (as a user or a developer), any function name you
type (sort, rank, print, summary, etc...) is in Queen's English.
And their man pages too.

Also note that I was just talking about the *default*. AFAIK other
very serious projects like Python or SQLite *by default* use a
collating sequence that behaves like LC_COLLATE=C on strings
that contain ASCII chars only. And they let you change that if you
want. Are they being imperialist? Most R users/developers are in
research or academics where I suspect consistency and reproducibility
is even a bigger deal than in the Python or SQLite community.

Cheers,
H.


>
> Barry


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list