[BioC] genereating correlation matrix from gene expression data

Robert Castelo robert.castelo at upf.edu
Thu Nov 19 10:03:34 CET 2009


Pankaj,

in addition to what Steve provided below, if you are dealing with
expression data, it might be useful for you the function qpPCC from the
bioconductor package 'qpgraph'. it performs the same calculation as the
'cor' function illustrated in Steve's message but in addition it accepts
as input parameter not only a matrix or a data frame but also an
ExpressionSet object (not a big deal) and, more interestingly, it
calculates for you the P-values of a two sided test for the null
hypothesis of zero Pearson correlation. this latter feature might be
useful since as far as i know if one wants to test every pair of genes
and get the P-values one needs to call 'cor.test' for each pair of genes
within a two nested for loops or some 'apply' call and this function
'qpPCC' does it using matrix calculations and thus is faster for that
purpose. of course, if you are interested in determining significance
through the P-values you will have to deal with the multiple testing
problem.

library(qpgraph)

set.seed(123)
x <- matrix(rnorm(20), 4, 5)

round(x, digits=3)
       [,1]   [,2]   [,3]   [,4]   [,5]
[1,] -0.560  0.129 -0.687  0.401  0.498
[2,] -0.230  1.715 -0.446  0.111 -1.967
[3,]  1.559  0.461  1.224 -0.556  0.701
[4,]  0.071 -1.265  0.360  1.787 -0.473

lapply(qpPCC(x), round, digits=3)
$R
       1      2      3      4      5
1  1.000 -0.016  0.958 -0.490  0.437
2 -0.016  1.000 -0.271 -0.753 -0.462
3  0.958 -0.271  1.000 -0.218  0.431
4 -0.490 -0.753 -0.218  1.000 -0.198
5  0.437 -0.462  0.431 -0.198  1.000

$P
      1     2     3     4     5
1 0.000 0.984 0.042 0.510 0.563
2 0.984 0.000 0.729 0.247 0.538
3 0.042 0.729 0.000 0.782 0.569
4 0.510 0.247 0.782 0.000 0.802
5 0.563 0.538 0.569 0.802 0.000

cor.test(x[,1], x[,2])

        Pearson's product-moment correlation

data:  x[, 1] and x[, 2] 
t = -0.0231, df = 2, p-value = 0.9837
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval:
 -0.962  0.960 
sample estimates:
    cor 
-0.0163 

as a final shameless remark :) if you are interested in trying to sort
out genuine from spurious correlations you might want to calculate the
so-called "partial correlations" (see
http://en.wikipedia.org/wiki/Partial_correlation) which in fact one
cannot calculate with gene expression data because typically you will
many more genes than samples. instead, as a sort of surrogate for
partial correlations, you might find useful to calculate the so-called
non-rejection rates through the functions 'qpNrr' or 'qpAvgNrr' from the
'qpgraph' package (look at the corresponding help pages if you're
interested).

best,
robert.

On Wed, 2009-11-18 at 14:05 -0500, Steve Lianoglou wrote:
> Hi Pankaj,
> 
> On Nov 18, 2009, at 1:30 PM, pankaj borah wrote:
> 
> > Hi all,
> >
> > i am a new user of R and Bioconductor package. i deal with gene  
> > expression data. i was wondering if there is a way to generate a  
> > correlation matrix (all to all square symmetric matrix) for  a set  
> > of genes and their expression values. is there any available function?
> 
> Try ?cor to get the pairwise correlation between columns of your  
> matrix, eg:
> 
> R> x <- matrix(rnorm(20), 4, 5)
> R> x
>             [,1]        [,2]       [,3]       [,4]       [,5]
> [1,]  0.5818808  0.58781709  2.5511157 -1.5332180  0.6905731
> [2,] -0.7640311  0.25960974  0.7246655 -1.5539226 -0.5459625
> [3,] -1.7141619 -0.25808091 -0.0868366 -0.6547804  0.4629494
> [4,] -2.2906217  0.04932864  0.5694895  0.7736206  0.4078665
> 
> R> cor(x)
>              [,1]         [,2]       [,3]       [,4]        [,5]
> [1,]  1.00000000  0.851982381  0.8715055 -0.8404848 0.074770380
> [2,]  0.85198238  1.000000000  0.9429553 -0.5338964 0.004627258
> [3,]  0.87150545  0.942955253  1.0000000 -0.4719936 0.325584972
> [4,] -0.84048477 -0.533896414 -0.4719936  1.0000000 0.309414781
> [5,]  0.07477038  0.004627258  0.3255850  0.3094148 1.000000000
> 
> If your genes are in rows, you'll just need to pass in the transpose  
> of your matrix.
> 
> HTH,
> 
> -steve
> 
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>    |  Memorial Sloan-Kettering Cancer Center
>    |  Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list