[R] Spearman correlation and missing observations

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Nov 26 15:04:10 CET 2003


Nicolas STRANSKY <Nicolas.Stransky at curie.fr> writes:

> Hi,
> 
> I am using R 1.8.1 on WinXP. I encounter a problem when trying to
> compute a Spearman correlation under certain conditions (at least I
> think there is a problem, but maybe this is the normal behavior).
> 
> > X<-array(0,c(20,2))
> >
> > X[,1]<-c(runif(10),rep(NA,10))
> > X[,2]<-c(runif(10),rep(NA,10))
> >
> > Y<-X[1:10,]
> >
> > cor(Y,method="s",use="complete.obs")
>           [,1]      [,2]
> [1,] 1.0000000 0.3939394
> [2,] 0.3939394 1.0000000
> > cor(X,method="s",use="complete.obs")
>          [,1]     [,2]
> [1,] 1.000000 0.924812
> [2,] 0.924812 1.000000
> 
> 
> The problem is that I do not get the same results whenever there are
> NA's is the dataset or not. Perhaps I misunderstand the use of
> "complete.obs" and "pairwise.complete.obs" for dealing with missing data
> ; if so, please tell me how I could manage to have se same result at the
> end.
> 
> On the other hand, the same type of commands with a Pearson correlation
> gives exactly the same result for X and Y :
> 
> > cor(Y,method="p",use="complete.obs")
>           [,1]      [,2]
> [1,] 1.0000000 0.3109109
> [2,] 0.3109109 1.0000000
> > cor(X,method="p",use="complete.obs")
>           [,1]      [,2]
> [1,] 1.0000000 0.3109109
> [2,] 0.3109109 1.0000000
> 
> Thank's for your help

Oh, d*mn....

The problem is that 

> rank(c(runif(10),rep(NA,10)))
 [1]  4  8  6  5  9 10  2  3  7  1 11 12 13 14 15 16 17 18 19 20

and we want 

> rank(c(runif(10),rep(NA,10)),na.last="keep")
 [1]  6  2  9  5  8  3  7  1 10  4 NA NA NA NA NA NA NA NA NA NA

so inside cor, we need to add na.last="keep" in two places:

    if (method != "pearson") {
        Rank <- function(u) if (is.matrix(u))
            apply(u, 2, rank, na.last="keep")
        else rank(u, na.last="keep")
        x <- Rank(x)
        if (!is.null(y))
            y <- Rank(y)
    }


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list