[R] correlation with missing values.. different answers

arun smartpink111 at yahoo.com
Mon Apr 14 03:36:04 CEST 2014




Hi,

I think in this case, when you use "na.or.complete", all the NA rows are removed for the full dataset.
cor(swM[-1,1:2]) 
#          Frtlty    Agrclt
 #Frtlty 1.0000000 0.3920289 
#Agrclt 0.3920289 1.0000000 

cor(swM[-1,])[1:2,1:2] 
#Frtlty    Agrclt 
#Frtlty 1.0000000 0.3920289 
#Agrclt 0.3920289 1.0000000 

May be you can try with "pairwise.complete.obs"
cor(swM, use = "pairwise.complete.obs") 
#           Frtlty      Agrclt     Exmntn      Eductn     Cathlc      Infn.M 
#Frtlty  1.0000000  0.39202893 -0.6531492 -0.66378886  0.4723129  0.41655603 
#Agrclt  0.3920289  1.00000000 -0.7150561 -0.65221506  0.4152007 -0.03648427 
#Exmntn -0.6531492 -0.71505612  1.0000000  0.69921153 -0.6003402 -0.11433546
 #Eductn -0.6637889 -0.65221506  0.6992115  1.00000000 -0.1791334 -0.09932185
 #Cathlc  0.4723129  0.41520069 -0.6003402 -0.17913339  1.0000000  0.18503786
 #Infn.M  0.4165560 -0.03648427 -0.1143355 -0.09932185  0.1850379  1.00000000
 cor(swM[,1:2],use="pairwise.complete.obs") 
#          Frtlty    Agrclt 
#Frtlty 1.0000000 0.3920289 
#Agrclt 0.3920289 1.0000000

A.K.

On Sunday, April 13, 2014 9:11 PM, Paul Tanger <paul.tanger at colostate.edu> wrote:
Hi,
I can't seem to figure out why this gives me different answers.  Probably
something obvious, but I thought they would be the same.
This is an minimal example from the help page of cor() :

> ## swM := "swiss" with  3 "missing"s :
> swM <- swiss
> colnames(swM) <- abbreviate(colnames(swiss), min=6)
> swM[1,2] <- swM[7,3] <- swM[25,5] <- NA # create 3 "missing"
> cor(swM, use = "na.or.complete")
           Frtlty      Agrclt     Exmntn      Eductn     Cathlc      Infn.M
Frtlty  1.0000000  0.37821953 -0.6548306 -0.67421581  0.4772298  0.38781500
Agrclt  0.3782195  1.00000000 -0.7127078 -0.64337782  0.4014837 -0.07168223
Exmntn -0.6548306 -0.71270778  1.0000000  0.69776906 -0.6079436 -0.10710047
Eductn -0.6742158 -0.64337782  0.6977691  1.00000000 -0.1701445 -0.08343279
Cathlc  0.4772298  0.40148365 -0.6079436 -0.17014449  1.0000000  0.17221594
Infn.M  0.3878150 -0.07168223 -0.1071005 -0.08343279  0.1722159  1.00000000
> # why isn't this the same?
> cor(swM[,c(1:2)], use = "na.or.complete")
          Frtlty    Agrclt
Frtlty 1.0000000 0.3920289
Agrclt 0.3920289 1.0000000

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list