[Rd] cor() fails with big dataframe

Martin Maechler maechler at stat.math.ethz.ch
Thu Sep 16 12:44:09 CEST 2004


>>>>> "Mayeul" == Mayeul KAUFFMANN <mayeul.kauffmann at tiscali.fr>
>>>>>     on Thu, 16 Sep 2004 01:23:09 +0200 writes:

    Mayeul> Hello,
    Mayeul> I have a big dataframe with *NO* na's (9 columns, 293380 rows).

    Mayeul> # doing
    Mayeul> memory.limit(size = 1000000000)
    Mayeul> cor(x)
    Mayeul> #gives
    Mayeul> Error in cor(x) : missing observations in cov/cor
    Mayeul> In addition: Warning message:
    Mayeul> NAs introduced by coercion

"by coercion" means there were other things *coerced* to NAs!

One of the biggest problem with R users (and other S users for
that matter) is that if they get an error, they throw hands up
and ask for help - assuming the error message to be
non-intelligible.  Whereas it *is* intelligible (slightly ? ;-)
more often than not ...


    Mayeul> #I found the obvious workaround:
    Mayeul> COR <- matrix(rep(0, 81),9,9)
    Mayeul> for (i in 1:9) for (j in 1:9) {if (i>j) COR[i,j] <- cor (x[,i],x[,j])}
    Mayeul> #which works fine, with no warning

    Mayeul> #looks like a "cor()" bug.

quite improbably.

The following works flawlessly for me
and the only things that takes a bit of time is construction of
x, not cor():

  > n <- 300000
  > set.seed(1)
  > x <- as.data.frame(matrix(rnorm(n*9), n,9))
  > cx <- cor(x)
  > str(cx)
   num [1:9, 1:9]  1.00000 -0.00039  0.00113  0.00134 -0.00228 ...
   - attr(*, "dimnames")=List of 2
    ..$ : chr [1:9] "V1" "V2" "V3" "V4" ...
    ..$ : chr [1:9] "V1" "V2" "V3" "V4" ...


    Mayeul> #I checked absence of NA's by
    Mayeul> x <- x[complete.cases(x),]
    Mayeul> summary(x)
    Mayeul> apply(x,2, function (x) (sum(is.na(x))))

    Mayeul> #I use R 1.9.1

What does
    sapply(x, function(u)all(is.finite(u)))
return ?



More information about the R-devel mailing list