[Rd] Incorrect Kendall's tau for ordered variables (PR#14207)

Marek Ancukiewicz msa at biostat.mgh.harvard.edu
Mon Feb 8 15:07:01 CET 2010


Dear Peter,

Thank you. Although the documentation does mention numeric variables,
one would intuitively expect cor() and cor.test() to work for ordered
factors with methods "kendall" and "spearman". After all, these are
nonparametric procedures, defined for ordinal scales, and the only
information they need are ranks (the same should be true for
wilcox.test()).

So even if this is, strictly speaking, not a bug I would strongly
suggest extending cor() and cor.test() to work with ordered factors for
Kendall's and Spearman's correlations (although this would not make much
sense for Pearson's correlation). It looks like the change should be
very easy.

Marek Ancukiewicz

> Date: Mon, 08 Feb 2010 14:23:08 +0100
> From: Peter Dalgaard <P.Dalgaard at biostat.ku.dk>
> Cc: r-devel at stat.math.ethz.ch, R-bugs at r-project.org
> 
> msa at biostat.mgh.harvard.edu wrote:
> > Full_Name: Marek Ancukiewicz
> > Version: 2.10.1
> > OS: Linux
> > Submission from: (NULL) (74.0.49.2)
> > 
> > 
> > Both cor() and cor.test() incorrectly handle ordered variables with 
> > method="kendall", cor() incorrectly handles ordered variables for 
> > method="spearman" (method="person" always works correctly, while 
> > method="spearman" works for cor.test, but not for cor()).
> > 
> > In erroneous calculations these functions ignore the inherent ordering
> > of the ordered variable (e.g., '9'<'10'<'11') and instead seem to assume 
> > an alphabetic ordering ('10'<'11'<'9'). 
> 
> Strictly speaking, not a bug, since the documentation has
> 
>        x: a numeric vector, matrix or data frame.
> 
> respectively
> 
>     x, y: numeric vectors of data values.  ‘x’ and ‘y’ must have the
>           same length.
> 
> so noone ever claimed that class "ordered" variables should work.
> 
> However, the root cause is that as.vector on a factor variable (ordered
> or not) converts it to a character vector, hence
> 
> > rank(as.vector(as.ordered(9:11)))
> [1] 3 1 2
> 
> Looks like a simple fix would be to use as.vector(x, "numeric") inside
> the definition of cor().
> 
> 
> >> cor(9:11,1:3,method="k")
> > [1] 1
> >> cor(as.ordered(9:11),1:3,method="k")
> > [1] -0.3333333
> >> cor.test(as.ordered(9:11),1:3,method="k")
> > 
> > 	Kendall's rank correlation tau
> > 
> > data:  as.ordered(9:11) and 1:3 
> > T = 1, p-value = 1
> > alternative hypothesis: true tau is not equal to 0 
> > sample estimates:
> >        tau 
> > -0.3333333 
> > 
> >> cor(9:11,1:3,method="s")
> > [1] 1
> >> cor(as.ordered(9:11),1:3,method="s")
> > [1] -0.5
> >> cor.test(as.ordered(9:11),1:3,method="s")
> > 
> > 	Spearman's rank correlation rho
> > 
> > data:  as.ordered(9:11) and 1:3 
> > S = 0, p-value = 0.3333
> > alternative hypothesis: true rho is not equal to 0 
> > sample estimates:
> > rho 
> >   1
> > 
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> -- 
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
> 
-------------- next part --------------


The information in this e-mail is intended only for the ...{{dropped:13}}


More information about the R-devel mailing list