[Rd] Incorrect handling of NA's in cor() (PR#6750)

Thomas Lumley tlumley at u.washington.edu
Fri Apr 9 19:42:59 CEST 2004


On Fri, 9 Apr 2004 msa at biostat.mgh.harvard.edu wrote:

>
> Dear Uwe,
>
> You are wrong. First, I've read the help file before
> submitting the report. For two variables,
> use="pairwise.complete.obs" and use="complete.obs" should be
> equivalent, shouldn't it? Of sourse, the results will be
> different when we have more than 2 variables. Second, with the
> call you proposed I am also getting incorrect result:
>

I think it's more complicated than either of you are considering.

For the Pearson correlation everything is straightforward, and
pairwise.complete is the same as complete, which is the same as dropping
the NAs manually.

For the rank correlations the question is when the ranking should be done.
The cor() function ranks the observations and then drops missing values,
the manual approach drops missing values and then ranks.

I'm not convinced that it is obvious which of these is right, though
certainly the help page should document whichever is being done.


	-thomas



More information about the R-devel mailing list