[R] Wrong result with cor(x, y, method="spearman", use="complete.obs") with NA's???
Thomas Lumley
tlumley at u.washington.edu
Mon Aug 30 23:09:26 CEST 2004
On Mon, 30 Aug 2004, [iso-8859-1] Karl Knoblick wrote:
> Hallo!
>
> Is there an error in cor to calculate Spearman
> correlation with cor if there are NA's? cor.test gives
> the correct result. At least there is a difference.
>
> Or am I doing something wrong???
The help for cor() says
Notice also that the ranking is (currently) done
removing only cases that are missing on the variable itself, which
may not be what you expect if you let 'use' be '"complete.obs"' or
'"pairwise.complete.obs"'.
>
> Does anybody know something about this?
>
> a<-c(2,4,3,NA)
> b<-c(4,1,2,3)
> cor(a, b, method="spearman", use="complete.obs")
> # -0.9819805
That is, when b is converted to ranks the ranks are c(4,1,2,3), not
c(3,1,2), because b has no missing data. cor() then takes the correlation
of c(2,4,3) and c(3,1,2), which is -0.98..
> cor.test(a, b, method="spearman")
> # -1
cor.test does it the other way around. It first drops all the observations
with NAs on any variable, then does the ranking.
>
> Without the NA both methods give -1
> cor(a[1:3], b[1:3], method="s", use="c")
> # -1
>
> Is there another method to calculate a nice table with
> correlations like cor(data.frame) is doing? Perhaps
> even with p-values or "stars"?
You could use cor(na.omit(data.frame))) to get the same NA behaviour as
cor.test(). No pretty stars, though.
-thomas
More information about the R-help
mailing list