[R] cor(data.frame) infelicities

Michael Friendly friendly at yorku.ca
Mon Dec 3 15:27:07 CET 2007


In using cor(data.frame), it is annoying that you have to explicitly 
filter out non-numeric columns, and when you don't, the error message
is misleading:

 > cor(iris)
Error in cor(iris) : missing observations in cov/cor
In addition: Warning message:
In cor(iris) : NAs introduced by coercion

It would be nicer if stats:::cor() did the equivalent *itself* of the 
following for a data.frame:
 > cor(iris[,sapply(iris, is.numeric)])
              Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000
 >

A change could be implemented here:
     if (is.data.frame(x))
         x <- as.matrix(x)

Second, the default, use="all" throws an error if there are any
NAs.  It would be nicer if the default was use="complete.cases",
which would generate warnings instead.  Most other statistical
software is more tolerant of missing data.

 > library(corrgram)
 > data(auto)
 > cor(auto[,sapply(auto, is.numeric)])
Error in cor(auto[, sapply(auto, is.numeric)]) :
   missing observations in cov/cor
 > cor(auto[,sapply(auto, is.numeric)],use="complete")
# works; output elided

-Michael

-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA



More information about the R-help mailing list