[R] bug? in stats::cor for use=complete.obs with NAs

Peter Ehlers ehlers at ucalgary.ca
Fri Jun 11 01:02:36 CEST 2010

I don't think that this would be considered a bug. The
reason for the discrepancy between use="complete.obs"
and use="pairwise.complete.obs" for the case of the
Spearman correlation of two vectors x, y is this:

"pairwise" does complete.cases(x,y) and then ranks;
this is also what's done in cor.test().

"complete" ranks first (keeping NAs via the
na.last="keep" argument to rank()) and then does
complete.cases(ranked.x,ranked.y) on the ranked data.
This can obviously lead to a different set of
ranks being correlated than those for "pairwise".

I must admit that I wasn't aware that R does this
and I don't know the rationale for it. The help page

    If use is "complete.obs" then missing values are
    handled by casewise deletion ...

which is not clear on the order of ranking and
deletion, but further down the page:

    Note that "spearman" basically computes cor(R(x), R(y))
    (or cov(.,.)) where R(u) := rank(u, na.last="keep").
    In the case of missing values, the ranks are calculated
    depending on the value of use, either based on complete
    observations, or based on pairwise completeness with
    reranking for each pair.

I guess that this implies that, for "complete", the ranking
occurs before the casewise deletion (else why the

If anyone knows the rationale and/or can give a reference,
I'd be glad to receive such.

   -Peter Ehlers

On 2010-06-09 11:36, hugh.genin at thomsonreuters.com wrote:
> Arrrrr,
> I think I've found a bug in the behavior of the stats::cor function when
> NAs are present, but in case I'm missing something, could you look over
> this example and let me know what you think:
>> a = c(1,3,NA,1,2)
>> b = c(1,2,1,1,4)
>> cor(a,b,method="spearman", use="complete.obs")
> [1] 0.8164966
>> cor(a,b,method="spearman", use="pairwise.complete.obs")
> [1] 0.7777778
> My understanding is that, when the inputs are vectors (but not
> necessarily when they're matrices), the "complete.obs" and
> "pairwise.complete.obs" arguments should give identical spearman
> correlations.  The above example clearly shows they do not in my version
> of R (2.11.1).  However, in cor.test, they do:
>> cor.test(a,b,method="spearman", use="complete.obs")
>          Spearman's rank correlation rho
> data:  a and b
> S = 2.2222, p-value = 0.2222
> alternative hypothesis: true rho is not equal to 0
> sample estimates:
>        rho
> 0.7777778
> So cor and cor.test do not agree, which seems very likely to be a bug.
> When calculating by hand, I also get 0.7777778.  Additionally, when
> using an old version of R (2.5.0), both the complete.obs and
> pairwise.complete.obs versions give 0.7777778.  Which strongly suggests
> either 2.5.0 or 2.11.1 has a bug in it.  Is this a bug?  If so, has it
> already been reported?  (I found a related but confusing email thread
> from 2004 in the R archives, but I did not find the resolution to that
> bug report).
> Additional info:
> Platform = Windows XP
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> i386-pc-mingw32
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252    LC_MONETARY=English_United States.1252
> [4] LC_NUMERIC=C                           LC_TIME=English_United
> States.1252
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> loaded via a namespace (and not attached):
> [1] tools_2.11.1
>> Sys.getlocale()
> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
> Thanks,
> --Hugh

More information about the R-help mailing list