[R] Wrong result with cor(x, y, method="spearman", use="complete.obs") with NA's???

Thomas Lumley tlumley at u.washington.edu
Mon Aug 30 23:09:26 CEST 2004


On Mon, 30 Aug 2004, [iso-8859-1] Karl Knoblick wrote:

> Hallo!
>
> Is there an error in cor to calculate Spearman
> correlation with cor if there are NA's? cor.test gives
> the correct result. At least there is a difference.
>
> Or am I doing something wrong???

The help for cor() says

		      Notice also that the ranking is (currently) done
     removing only cases that are missing on the variable itself, which
     may not be what you expect if you let 'use' be '"complete.obs"' or
     '"pairwise.complete.obs"'.

>
> Does anybody know something about this?
>
> a<-c(2,4,3,NA)
> b<-c(4,1,2,3)
> cor(a, b, method="spearman", use="complete.obs")
> # -0.9819805

That is, when b is converted to ranks the ranks are c(4,1,2,3), not
c(3,1,2), because b has no missing data. cor() then takes the correlation
of c(2,4,3) and c(3,1,2), which is -0.98..


> cor.test(a, b, method="spearman")
> # -1

cor.test does it the other way around. It first drops all the observations
with NAs on any variable, then does the ranking.

>
> Without the NA both methods give -1
> cor(a[1:3], b[1:3], method="s", use="c")
> # -1
>
> Is there another method to calculate a nice table with
> correlations like cor(data.frame) is doing? Perhaps
> even with p-values or "stars"?

You could use cor(na.omit(data.frame))) to get the same NA behaviour as
cor.test(). No pretty stars, though.


	-thomas




More information about the R-help mailing list