[R] Pearson corelation and p-value for matrix

John Fox jfox at mcmaster.ca
Sat Apr 16 14:00:55 CEST 2005


Dear Andy,

That's clearly much better -- and illustrates an effective strategy for
vectorizing (or "matricizing") a computation. I think I'll add this to my
list of programming examples. It might be a little dangerous to pass ...
through to cor(), since someone could specify type="spearman", for example.

Thanks,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com] 
> Sent: Friday, April 15, 2005 9:51 PM
> To: 'John Fox'; MSchwartz at medanalytics.com
> Cc: 'R-Help'; 'Dren Scott'
> Subject: RE: [R] Pearson corelation and p-value for matrix
> 
> We can be a bit sneaky and `borrow' code from cor.test.default:
> 
> cor.pval <- function(x,  alternative="two-sided", ...) {
>     corMat <- cor(x, ...)
>     n <- nrow(x)
>     df <- n - 2
>     STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2)
>     p <- pt(STATISTIC, df)
>     p <- if (alternative == "less") {
>         p
>     } else if (alternative == "greater") {
>         1 - p
>     } else 2 * pmin(p, 1 - p)
>     p
> }
> 
> The test:
> 
> > system.time(c1 <- cor.pvals(X), gcFirst=TRUE)
> [1] 13.19  0.01 13.58    NA    NA
> > system.time(c2 <- cor.pvalues(X), gcFirst=TRUE)
> [1] 6.22 0.00 6.42   NA   NA
> > system.time(c3 <- cor.pval(X), gcFirst=TRUE)
> [1] 0.07 0.00 0.07   NA   NA
> 
> Cheers,
> Andy
> 
> > From: John Fox
> > 
> > Dear Mark,
> > 
> > I think that the reflex of trying to avoid loops in R is often 
> > mistaken, and so I decided to try to time the two approaches (on a 
> > 3GHz Windows XP system).
> > 
> > I discovered, first, that there is a bug in your function -- you 
> > appear to have indexed rows instead of columns; fixing that:
> > 
> > cor.pvals <- function(mat)
> > {
> >   cols <- expand.grid(1:ncol(mat), 1:ncol(mat))
> >   matrix(apply(cols, 1,
> >                function(x) cor.test(mat[, x[1]], mat[, 
> > x[2]])$p.value),
> >          ncol = ncol(mat))
> > }
> > 
> > 
> > My function is cor.pvalues and yours cor.pvals. This is for a data 
> > matrix with 1000 observations on 100 variables:
> > 
> > > R <- diag(100)
> > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5
> > > library(mvtnorm)
> > > X <- rmvnorm(1000, sigma=R)
> > > dim(X)
> > [1] 1000  100
> > > 
> > > system.time(cor.pvalues(X))
> > [1] 5.53 0.00 5.53   NA   NA
> > > 
> > > system.time(cor.pvals(X))
> > [1] 12.66  0.00 12.66    NA    NA
> > > 
> > 
> > I frankly didn't expect the advantage of my approach to be 
> this large, 
> > but there it is.
> > 
> > Regards,
> >  John
> > 
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> > 
> > > -----Original Message-----
> > > From: Marc Schwartz [mailto:MSchwartz at MedAnalytics.com]
> > > Sent: Friday, April 15, 2005 7:08 PM
> > > To: John Fox
> > > Cc: 'Dren Scott'; R-Help
> > > Subject: RE: [R] Pearson corelation and p-value for matrix
> > > 
> > > Here is what might be a slightly more efficient way to 
> get to John's
> > > question:
> > > 
> > > cor.pvals <- function(mat)
> > > {
> > >   rows <- expand.grid(1:nrow(mat), 1:nrow(mat))
> > >   matrix(apply(rows, 1,
> > >                function(x) cor.test(mat[x[1], ], mat[x[2], 
> > > ])$p.value),
> > >          ncol = nrow(mat))
> > > }
> > > 
> > > HTH,
> > > 
> > > Marc Schwartz
> > > 
> > > On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote:
> > > > Dear Dren,
> > > > 
> > > > How about the following?
> > > > 
> > > >  cor.pvalues <- function(X){
> > > >     nc <- ncol(X)
> > > >     res <- matrix(0, nc, nc)
> > > >     for (i in 2:nc){
> > > >         for (j in 1:(i - 1)){
> > > >             res[i, j] <- res[j, i] <- cor.test(X[,i],
> > X[,j])$p.value
> > > >             }
> > > >         }
> > > >     res
> > > >     }
> > > > 
> > > > What one then does with all of those non-independent test
> > > is another
> > > > question, I guess.
> > > > 
> > > > I hope this helps,
> > > >  John
> > > 
> > > > > -----Original Message-----
> > > > > From: r-help-bounces at stat.math.ethz.ch 
> > > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> > Dren Scott
> > > > > Sent: Friday, April 15, 2005 4:33 PM
> > > > > To: r-help at stat.math.ethz.ch
> > > > > Subject: [R] Pearson corelation and p-value for matrix
> > > > > 
> > > > > Hi,
> > > > >  
> > > > > I was trying to evaluate the pearson correlation and
> > the p-values
> > > > > for an nxm matrix, where each row represents a vector. 
> > > One way to do
> > > > > it would be to iterate through each row, and find its
> > correlation
> > > > > value( and the p-value) with respect to the other rows. 
> > Is there
> > > > > some function by which I can use the matrix as input? 
> > > Ideally, the
> > > > > output would be an nxn matrix, containing the p-values
> > > between the
> > > > > respective vectors.
> > > > >  
> > > > > I have tried cor.test for the iterations, but couldn't find a 
> > > > > function that would take the matrix as input.
> > > > >  
> > > > > Thanks for the help.
> > > > >  
> > > > > Dren
> > > 
> > >
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> > 
> 
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachments, 
> contains information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station, New Jersey, USA 08889), and/or its 
> affiliates (which may be known outside the United States as 
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
> Banyu) that may be confidential, proprietary copyrighted 
> and/or legally privileged. It is intended solely for the use 
> of the individual or entity named on this message.  If you 
> are not the intended recipient, and have received this 
> message in error, please notify us immediately by reply 
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------




More information about the R-help mailing list