# [R] Pairwise n for large correlation tables?

Gabor Grothendieck ggrothendieck at gmail.com
Tue Aug 8 04:40:19 CEST 2006

```Try this:

# mat is test matrix
mat <- matrix(1:25, 5)
mat[2,2] <- mat[3,4] <- NA
crossprod(!is.na(mat))

> Hello,
>
> I'm using a very large data set (n > 100,000 for 7 columns), for which I'm
> pretty happy dealing with pairwise-deleted correlations to populate my
> correlation table. E.g.,
>
> a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")
>
> ...however, I am interested in the number of cases used to compute each
> cell of the correlation table. I am unable to find such a function via
> google searches, so I wrote one of my own. This turns out to be highly
> inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
> hints, regarding other functions to use or ways to maket his speedier, would
> be much appreciated!
>
> pairwise.n <- function(df=stop("Must provide data frame!")) {
>   if (!is.data.frame(df)) {
>     df <- as.data.frame(df)
>   }
>   colNum <- ncol(df)
>   result <- matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
>   for(i in 1:colNum) {
>     for (j in i:colNum) {
>       result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
>     }
>   }
>   result
> }
>
> --