[Rd] Incorrect Kendall's tau for ordered variables (PR#14207)

ripley at stats.ox.ac.uk ripley at stats.ox.ac.uk
Mon Feb 8 18:11:30 CET 2010


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--27464147-2083486994-1265648951=:12668
Content-Type: TEXT/PLAIN; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8BIT

On Mon, 8 Feb 2010, Peter Dalgaard wrote:

> msa at biostat.mgh.harvard.edu wrote:
>> Full_Name: Marek Ancukiewicz
>> Version: 2.10.1
>> OS: Linux
>> Submission from: (NULL) (74.0.49.2)
>>
>>
>> Both cor() and cor.test() incorrectly handle ordered variables with
>> method="kendall", cor() incorrectly handles ordered variables for
>> method="spearman" (method="person" always works correctly, while
>> method="spearman" works for cor.test, but not for cor()).
>>
>> In erroneous calculations these functions ignore the inherent ordering
>> of the ordered variable (e.g., '9'<'10'<'11') and instead seem to assume
>> an alphabetic ordering ('10'<'11'<'9').
>
> Strictly speaking, not a bug, since the documentation has
>
>       x: a numeric vector, matrix or data frame.
>
> respectively
>
>    x, y: numeric vectors of data values.  ‘x’ and ‘y’ must have the
>          same length.
>
> so noone ever claimed that class "ordered" variables should work.
>
> However, the root cause is that as.vector on a factor variable (ordered
> or not) converts it to a character vector, hence
>
>> rank(as.vector(as.ordered(9:11)))
> [1] 3 1 2
>
> Looks like a simple fix would be to use as.vector(x, "numeric") inside
> the definition of cor().

A fix for that particular case: the problem is that relies on the 
underlying representation.  I think a better fix would be to do either 
of

- test for numeric and throw an error otherwise, or
- use xtfrm, which has the advantage of being more general and
   allowing methods to be written (S3 or S4 methods in R-devel).

>
>
>>> cor(9:11,1:3,method="k")
>> [1] 1
>>> cor(as.ordered(9:11),1:3,method="k")
>> [1] -0.3333333
>>> cor.test(as.ordered(9:11),1:3,method="k")
>>
>> 	Kendall's rank correlation tau
>>
>> data:  as.ordered(9:11) and 1:3
>> T = 1, p-value = 1
>> alternative hypothesis: true tau is not equal to 0
>> sample estimates:
>>        tau
>> -0.3333333
>>
>>> cor(9:11,1:3,method="s")
>> [1] 1
>>> cor(as.ordered(9:11),1:3,method="s")
>> [1] -0.5
>>> cor.test(as.ordered(9:11),1:3,method="s")
>>
>> 	Spearman's rank correlation rho
>>
>> data:  as.ordered(9:11) and 1:3
>> S = 0, p-value = 0.3333
>> alternative hypothesis: true rho is not equal to 0
>> sample estimates:
>> rho
>>   1
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
> --
>   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
> (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
--27464147-2083486994-1265648951=:12668--



More information about the R-devel mailing list