[BioC] Which package for gene expression correlation analysis

Naomi Altman naomi at stat.psu.edu
Tue Jun 29 04:18:25 CEST 2010

When you do thousands of correlations with only 4 samples, you can 
expect a lot of very high correlations just by chance.  So, you 
should filter your genes by some criterion of "interestingness" 
before performing correlation analysis.


At 12:02 PM 6/28/2010, Steve Lianoglou wrote:
>On Mon, Jun 28, 2010 at 10:53 AM, Yuan Hao <yuan.hao at cantab.net> wrote:
> > Dear List,
> >
> > I would like to ask if there is such a bioconductor package available that
> > can help to achieve the following purpose. Thank you very much in advance!
> >
> > I got 16 Affy chips corresponding to 4 samples: wild-type treated,
> > wide-type untreated, knocked-down treated, and knocked-down untreated,
> > i.e. 4 replicates for each sample.
> >
> > I want to look at the expression correlations between genes. Say, my gene
> > of interest is gene X. I would like to find out other genes on the chip
> > which have the similar expression profiles with gene X across samples. In
> > other words, if expression levels of gene X increased from wild-type
> > treated to knocked-out treated, I would like to find all the other genes
> > have the same trend.
>Given the size of the bioconductor universe, it's hard to say with any
>certainty that a certain function does NOT exist, but I'd be somehow
>surprised if this function actually is there, since it's relatively
>easy for you to implement yourself.
>You are essentially repeatedly performing a test against each row of
>your expression matrix, so think "loops" or some incantation of *apply
>Here's an easy one. Let's assume:
>  * `exprs` is a (gene x experiment) matrix with your expression value.
>  * the value `x` holds the row index of the gene you are interested
>R> set.seed(123)
>R> exprs <- matrix(rnorm(100), 5)
>R> x <- 1
>Now you want to test the correlation of the vector @ x with the rest.
>R> cors <- apply(exprs[-x,], 1, cor.test, exprs[x,])
>This gives you a list of correlation tests that you can (i) get the
>statistic out of; and (ii) order
>R> cors.estimate <- sapply(cors, '[[', 'estimate')  ## (i)
>R> alike <- order(cors.estimate, decreasing=TRUE) ## (ii)
>`alike` now has the indices of genes that are "most + correlated" to
>"most - correlated" to gene "x"
>If you're a bit more familiar with R functions, you might have known
>the there is function named "cor" that creates a correlation matrix
>out of matrix. This function works column-wise, so you first have to
>transpose your matrix:
>R> all.cors <- cor(t(exprs))
>R> cors.estimate
>         cor         cor         cor         cor
>-0.01971735 -0.26353249  0.03361119 -0.11578081
>R> all.cors[1,]
>[1]  1.00000000 -0.01971735 -0.26353249  0.03361119 -0.11578081
>The various cluster/heatmap fucntions do correlation based clustering
>by default (I believe), which will group your genes row-wise (and
>column wise) for you.
>Look at ?heatmap and check what that function returns to you in the
>"Value" section.
>Steve Lianoglou
>Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
>Contact Info: http://cbio.mskcc.org/~lianos/contact
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>Search the archives: 

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111

More information about the Bioconductor mailing list