[Rd] cor() fails with big dataframe
Martin Maechler
maechler at stat.math.ethz.ch
Thu Sep 16 12:44:09 CEST 2004
>>>>> "Mayeul" == Mayeul KAUFFMANN <mayeul.kauffmann at tiscali.fr>
>>>>> on Thu, 16 Sep 2004 01:23:09 +0200 writes:
Mayeul> Hello,
Mayeul> I have a big dataframe with *NO* na's (9 columns, 293380 rows).
Mayeul> # doing
Mayeul> memory.limit(size = 1000000000)
Mayeul> cor(x)
Mayeul> #gives
Mayeul> Error in cor(x) : missing observations in cov/cor
Mayeul> In addition: Warning message:
Mayeul> NAs introduced by coercion
"by coercion" means there were other things *coerced* to NAs!
One of the biggest problem with R users (and other S users for
that matter) is that if they get an error, they throw hands up
and ask for help - assuming the error message to be
non-intelligible. Whereas it *is* intelligible (slightly ? ;-)
more often than not ...
Mayeul> #I found the obvious workaround:
Mayeul> COR <- matrix(rep(0, 81),9,9)
Mayeul> for (i in 1:9) for (j in 1:9) {if (i>j) COR[i,j] <- cor (x[,i],x[,j])}
Mayeul> #which works fine, with no warning
Mayeul> #looks like a "cor()" bug.
quite improbably.
The following works flawlessly for me
and the only things that takes a bit of time is construction of
x, not cor():
> n <- 300000
> set.seed(1)
> x <- as.data.frame(matrix(rnorm(n*9), n,9))
> cx <- cor(x)
> str(cx)
num [1:9, 1:9] 1.00000 -0.00039 0.00113 0.00134 -0.00228 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:9] "V1" "V2" "V3" "V4" ...
..$ : chr [1:9] "V1" "V2" "V3" "V4" ...
Mayeul> #I checked absence of NA's by
Mayeul> x <- x[complete.cases(x),]
Mayeul> summary(x)
Mayeul> apply(x,2, function (x) (sum(is.na(x))))
Mayeul> #I use R 1.9.1
What does
sapply(x, function(u)all(is.finite(u)))
return ?
More information about the R-devel
mailing list