[R] bug? in stats::cor for use=complete.obs with NAs

hugh.genin at thomsonreuters.com hugh.genin at thomsonreuters.com
Wed Jun 9 19:36:22 CEST 2010


Arrrrr,

I think I've found a bug in the behavior of the stats::cor function when
NAs are present, but in case I'm missing something, could you look over
this example and let me know what you think:


> a = c(1,3,NA,1,2)
> b = c(1,2,1,1,4)
> cor(a,b,method="spearman", use="complete.obs")
[1] 0.8164966
> cor(a,b,method="spearman", use="pairwise.complete.obs")
[1] 0.7777778

My understanding is that, when the inputs are vectors (but not
necessarily when they're matrices), the "complete.obs" and
"pairwise.complete.obs" arguments should give identical spearman
correlations.  The above example clearly shows they do not in my version
of R (2.11.1).  However, in cor.test, they do:


> cor.test(a,b,method="spearman", use="complete.obs")

        Spearman's rank correlation rho

data:  a and b 
S = 2.2222, p-value = 0.2222
alternative hypothesis: true rho is not equal to 0 
sample estimates:
      rho 
0.7777778 


So cor and cor.test do not agree, which seems very likely to be a bug.
When calculating by hand, I also get 0.7777778.  Additionally, when
using an old version of R (2.5.0), both the complete.obs and
pairwise.complete.obs versions give 0.7777778.  Which strongly suggests
either 2.5.0 or 2.11.1 has a bug in it.  Is this a bug?  If so, has it
already been reported?  (I found a related but confusing email thread
from 2004 in the R archives, but I did not find the resolution to that
bug report).


Additional info:
Platform = Windows XP
> sessionInfo()
R version 2.11.1 (2010-05-31) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United
States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


loaded via a namespace (and not attached):
[1] tools_2.11.1
> Sys.getlocale()
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Thanks,

--Hugh



More information about the R-help mailing list