[R] Pearson corelation and p-value for matrix

Liaw, Andy andy_liaw at merck.com
Sat Apr 16 14:51:47 CEST 2005


> From: John Fox 
> 
> Dear Andy,
> 
> That's clearly much better -- and illustrates an effective 
> strategy for
> vectorizing (or "matricizing") a computation. I think I'll 
> add this to my
> list of programming examples. It might be a little dangerous 
> to pass ...
> through to cor(), since someone could specify 
> type="spearman", for example.

Ah, yes, the "..." isn't likely to help here!  Also, it will 
only work correctly if there are no NA's, for example (or 
else the degree of freedom would be wrong).  

Best,
Andy
 
> Thanks,
>  John
> 
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox 
> -------------------------------- 
> 
> > -----Original Message-----
> > From: Liaw, Andy [mailto:andy_liaw at merck.com] 
> > Sent: Friday, April 15, 2005 9:51 PM
> > To: 'John Fox'; MSchwartz at medanalytics.com
> > Cc: 'R-Help'; 'Dren Scott'
> > Subject: RE: [R] Pearson corelation and p-value for matrix
> > 
> > We can be a bit sneaky and `borrow' code from cor.test.default:
> > 
> > cor.pval <- function(x,  alternative="two-sided", ...) {
> >     corMat <- cor(x, ...)
> >     n <- nrow(x)
> >     df <- n - 2
> >     STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2)
> >     p <- pt(STATISTIC, df)
> >     p <- if (alternative == "less") {
> >         p
> >     } else if (alternative == "greater") {
> >         1 - p
> >     } else 2 * pmin(p, 1 - p)
> >     p
> > }
> > 
> > The test:
> > 
> > > system.time(c1 <- cor.pvals(X), gcFirst=TRUE)
> > [1] 13.19  0.01 13.58    NA    NA
> > > system.time(c2 <- cor.pvalues(X), gcFirst=TRUE)
> > [1] 6.22 0.00 6.42   NA   NA
> > > system.time(c3 <- cor.pval(X), gcFirst=TRUE)
> > [1] 0.07 0.00 0.07   NA   NA
> > 
> > Cheers,
> > Andy
> > 
> > > From: John Fox
> > > 
> > > Dear Mark,
> > > 
> > > I think that the reflex of trying to avoid loops in R is often 
> > > mistaken, and so I decided to try to time the two 
> approaches (on a 
> > > 3GHz Windows XP system).
> > > 
> > > I discovered, first, that there is a bug in your function -- you 
> > > appear to have indexed rows instead of columns; fixing that:
> > > 
> > > cor.pvals <- function(mat)
> > > {
> > >   cols <- expand.grid(1:ncol(mat), 1:ncol(mat))
> > >   matrix(apply(cols, 1,
> > >                function(x) cor.test(mat[, x[1]], mat[, 
> > > x[2]])$p.value),
> > >          ncol = ncol(mat))
> > > }
> > > 
> > > 
> > > My function is cor.pvalues and yours cor.pvals. This is 
> for a data 
> > > matrix with 1000 observations on 100 variables:
> > > 
> > > > R <- diag(100)
> > > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5
> > > > library(mvtnorm)
> > > > X <- rmvnorm(1000, sigma=R)
> > > > dim(X)
> > > [1] 1000  100
> > > > 
> > > > system.time(cor.pvalues(X))
> > > [1] 5.53 0.00 5.53   NA   NA
> > > > 
> > > > system.time(cor.pvals(X))
> > > [1] 12.66  0.00 12.66    NA    NA
> > > > 
> > > 
> > > I frankly didn't expect the advantage of my approach to be 
> > this large, 
> > > but there it is.
> > > 
> > > Regards,
> > >  John
> > > 
> > > --------------------------------
> > > John Fox
> > > Department of Sociology
> > > McMaster University
> > > Hamilton, Ontario
> > > Canada L8S 4M4
> > > 905-525-9140x23604
> > > http://socserv.mcmaster.ca/jfox
> > > --------------------------------
> > > 
> > > > -----Original Message-----
> > > > From: Marc Schwartz [mailto:MSchwartz at MedAnalytics.com]
> > > > Sent: Friday, April 15, 2005 7:08 PM
> > > > To: John Fox
> > > > Cc: 'Dren Scott'; R-Help
> > > > Subject: RE: [R] Pearson corelation and p-value for matrix
> > > > 
> > > > Here is what might be a slightly more efficient way to 
> > get to John's
> > > > question:
> > > > 
> > > > cor.pvals <- function(mat)
> > > > {
> > > >   rows <- expand.grid(1:nrow(mat), 1:nrow(mat))
> > > >   matrix(apply(rows, 1,
> > > >                function(x) cor.test(mat[x[1], ], mat[x[2], 
> > > > ])$p.value),
> > > >          ncol = nrow(mat))
> > > > }
> > > > 
> > > > HTH,
> > > > 
> > > > Marc Schwartz
> > > > 
> > > > On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote:
> > > > > Dear Dren,
> > > > > 
> > > > > How about the following?
> > > > > 
> > > > >  cor.pvalues <- function(X){
> > > > >     nc <- ncol(X)
> > > > >     res <- matrix(0, nc, nc)
> > > > >     for (i in 2:nc){
> > > > >         for (j in 1:(i - 1)){
> > > > >             res[i, j] <- res[j, i] <- cor.test(X[,i],
> > > X[,j])$p.value
> > > > >             }
> > > > >         }
> > > > >     res
> > > > >     }
> > > > > 
> > > > > What one then does with all of those non-independent test
> > > > is another
> > > > > question, I guess.
> > > > > 
> > > > > I hope this helps,
> > > > >  John
> > > > 
> > > > > > -----Original Message-----
> > > > > > From: r-help-bounces at stat.math.ethz.ch 
> > > > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of
> > > Dren Scott
> > > > > > Sent: Friday, April 15, 2005 4:33 PM
> > > > > > To: r-help at stat.math.ethz.ch
> > > > > > Subject: [R] Pearson corelation and p-value for matrix
> > > > > > 
> > > > > > Hi,
> > > > > >  
> > > > > > I was trying to evaluate the pearson correlation and
> > > the p-values
> > > > > > for an nxm matrix, where each row represents a vector. 
> > > > One way to do
> > > > > > it would be to iterate through each row, and find its
> > > correlation
> > > > > > value( and the p-value) with respect to the other rows. 
> > > Is there
> > > > > > some function by which I can use the matrix as input? 
> > > > Ideally, the
> > > > > > output would be an nxn matrix, containing the p-values
> > > > between the
> > > > > > respective vectors.
> > > > > >  
> > > > > > I have tried cor.test for the iterations, but 
> couldn't find a 
> > > > > > function that would take the matrix as input.
> > > > > >  
> > > > > > Thanks for the help.
> > > > > >  
> > > > > > Dren
> > > > 
> > > >
> > > 
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! 
> > > http://www.R-project.org/posting-guide.html
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > --------------------------------------------------------------
> > ----------------
> > Notice:  This e-mail message, together with any attachments, 
> > contains information of Merck & Co., Inc. (One Merck Drive, 
> > Whitehouse Station, New Jersey, USA 08889), and/or its 
> > affiliates (which may be known outside the United States as 
> > Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
> > Banyu) that may be confidential, proprietary copyrighted 
> > and/or legally privileged. It is intended solely for the use 
> > of the individual or entity named on this message.  If you 
> > are not the intended recipient, and have received this 
> > message in error, please notify us immediately by reply 
> > e-mail and then delete it from your system.
> > --------------------------------------------------------------
> > ----------------
> 
> 
>




More information about the R-help mailing list