[BioC] Which package for gene expression correlation analysis

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Jun 28 18:02:14 CEST 2010


On Mon, Jun 28, 2010 at 10:53 AM, Yuan Hao <yuan.hao at cantab.net> wrote:
> Dear List,
> I would like to ask if there is such a bioconductor package available that
> can help to achieve the following purpose. Thank you very much in advance!
> I got 16 Affy chips corresponding to 4 samples: wild-type treated,
> wide-type untreated, knocked-down treated, and knocked-down untreated,
> i.e. 4 replicates for each sample.
> I want to look at the expression correlations between genes. Say, my gene
> of interest is gene X. I would like to find out other genes on the chip
> which have the similar expression profiles with gene X across samples. In
> other words, if expression levels of gene X increased from wild-type
> treated to knocked-out treated, I would like to find all the other genes
> have the same trend.

Given the size of the bioconductor universe, it's hard to say with any
certainty that a certain function does NOT exist, but I'd be somehow
surprised if this function actually is there, since it's relatively
easy for you to implement yourself.

You are essentially repeatedly performing a test against each row of
your expression matrix, so think "loops" or some incantation of *apply

Here's an easy one. Let's assume:
 * `exprs` is a (gene x experiment) matrix with your expression value.
 * the value `x` holds the row index of the gene you are interested

R> set.seed(123)
R> exprs <- matrix(rnorm(100), 5)
R> x <- 1

Now you want to test the correlation of the vector @ x with the rest.

R> cors <- apply(exprs[-x,], 1, cor.test, exprs[x,])

This gives you a list of correlation tests that you can (i) get the
statistic out of; and (ii) order

R> cors.estimate <- sapply(cors, '[[', 'estimate')  ## (i)
R> alike <- order(cors.estimate, decreasing=TRUE) ## (ii)

`alike` now has the indices of genes that are "most + correlated" to
"most - correlated" to gene "x"


If you're a bit more familiar with R functions, you might have known
the there is function named "cor" that creates a correlation matrix
out of matrix. This function works column-wise, so you first have to
transpose your matrix:

R> all.cors <- cor(t(exprs))

R> cors.estimate
        cor         cor         cor         cor
-0.01971735 -0.26353249  0.03361119 -0.11578081

R> all.cors[1,]
[1]  1.00000000 -0.01971735 -0.26353249  0.03361119 -0.11578081


The various cluster/heatmap fucntions do correlation based clustering
by default (I believe), which will group your genes row-wise (and
column wise) for you.
Look at ?heatmap and check what that function returns to you in the
"Value" section.


Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

More information about the Bioconductor mailing list