[R] Pairwise n for large correlation tables?

Adam D. I. Kramer adik at ilovebacon.org
Tue Aug 8 04:03:41 CEST 2006


Hello,

I'm using a very large data set (n > 100,000 for 7 columns), for which I'm
pretty happy dealing with pairwise-deleted correlations to populate my
correlation table. E.g.,

a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")

...however, I am interested in the number of cases used to compute each
cell of the correlation table. I am unable to find such a function via
google searches, so I wrote one of my own. This turns out to be highly
inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
hints, regarding other functions to use or ways to maket his speedier, would
be much appreciated!

pairwise.n <- function(df=stop("Must provide data frame!")) {
   if (!is.data.frame(df)) {
     df <- as.data.frame(df)
   }
   colNum <- ncol(df)
   result <- matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
   for(i in 1:colNum) {
     for (j in i:colNum) {
       result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
     }
   }
   result
}

--
Adam D. I. Kramer
University of Oregon



More information about the R-help mailing list