[R] In need of help with correlations

Peter Langfelder peter.langfelder at gmail.com
Mon Apr 11 10:25:40 CEST 2011


On Sat, Apr 9, 2011 at 10:24 AM, Sean Farris <farrissp2 at vcu.edu> wrote:
> I am in need of someone's help in correlating gene expression. I'm somewhat
> new to R, and can't seem to find anyone local to help me with what I think
> is a simple problem.
>
> I need to obtain pearson and spearman correlation coefficients, and
> corresponding p-values for all of the genes in my dataset that correlate to
> one specific gene of interest. I'm working with mouse Affymetrix Mouse 430
> 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column
> containing identifiers) and 30 biological replicates (columns; with the top
> row containing the header information).

Sean,

I'm the maintainer of the package WGCNA that does correlation network
analysis of gene expression data. I recommend you check out the
package and the tutorials at

http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html

The package contains a couple useful functions for correlation
p-values. Unlike cor.test which only takes two vectors (not matrices),
you can use the function corAndPvalue to calculate Pearson
correlations and the corresponding p-values for matrices. If you
already have the correlation matrix pre-calculated AND you have no
missing data (i.e., constant number of observations), you can also use
corPvalueStudent to calculate the p-values.

We don't use Spearman correlations much (we prefer the biweight
midcorrelation, functions bicor and bicorAndPvalue, as a robust
alternative to Pearson correlation), but you can approximate the
Spearman p-values by the Student p-values (that are used for Pearson
correlations). Statisticians who read this, please don't execute me
for this suggestion :)

To use the function cor(), you need to transpose the data so that
genes are in columns and samples in rows.
Just be aware that to correlate all probe sets at a time you need a
40k+ times 40k+ matrix to hold the result. Only a large computer (at
least 32GB of memory, possibly needing 64GB) will be able to handle
such a matrix and the necessary manipulations. The WGCNA package
contains methods to construct co-expression networks on such big sets
if necessary.

HTH,

Peter



More information about the R-help mailing list