[BioC] gene enrichment analysis without a control sample

Wu, Di dwu at fas.harvard.edu
Sat Nov 5 00:58:37 CET 2011


Hi Wendy, 

Here is the reference to that article.

Lim E, Vaillant F, Wu D, Forrest NC, Pal B, Hart AH, Asselin-
Labat M-L, Gyorki DE, Ward T, Partanen A, et al. 2009.
Aberrant luminal progenitors as the candidate target population
for basal tumor development in BRCA1 mutation
carriers. Nat Med 15: 907–913.

Now I understand your concern of the different platforms. It is always a problem. 

I would think about two strategies. 

First, I will see whether all the cell types I am interested in can be obtained in one platform. If so, I will firstly analyze the data in this platform to see how the results are like. If so for for both platform, gene set tests can be used to check the reproducibility across platforms. 

Second, if some cell types are overlapped, we might be able to use them to remove the batch (platform) effects after matching the gene symbols across platforms. R function "
removeBatchEffect" in limma package may work.

I am not sure, maybe "MergeMaid" package can also help to merge the data from two platforms.

After all these, you can do the routine differential expression data analysis. 

On the other hand, I think comparing the present/absent of  genes in cell types is not very reliable, as you noticed. The fact that the expression value on the array is higher in one gene (A) than in another gene (B) may not really indicates geneA is really expressed higher, maybe due to the difference of probes on the array. 

Good luck,
Di  



----
Di Wu
Postdoctoral fellow
Harvard University, Statistics Department
Harvard Medical School
Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA

________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] On Behalf Of Wendy Qiao [wendy2.qiao at gmail.com]
Sent: Friday, November 04, 2011 6:34 PM
To: bioconductor at r-project.org
Subject: Re: [BioC] gene enrichment analysis without a control sample

Hi Di,

Thank you very much for you email.

My major challenges with identifying differentially expressed genes is the
microarray data are from different platforms (Illumina and Affymetrix), and
those are the only data available for my project. In addition, my question
does not necessarily to find differentially expressed genes of each cell
type, but the *expressed genes* of each cell type are more interested. I
hope to find a way that avoids direct comparison between cell type and cell
type. I tried to rank the gene expression for each cell type and set a
cutoff for expressed and unexpressed genes, but the cutoff is arbitrary and
affects the downstream analysis. In this case, would you have any
suggestions? Any advice on obtaining differentially express ed genes for
microarray data from different platforms is also appreciated.

By the way, would you mind sending me the title of the paper that you
mentioned.

Thank you very much,
Wendy




On 4 November 2011 17:21, Wu, Di <dwu at fas.harvard.edu> wrote:

> Hi Wendy,
>
> I am not sure whether using a right gene set test is your current problem.
> It seems you want to find the signature genes for each of the cell types.
>  Therefore, for this question, it seems a differential expression problem
> to me.
>
> I understand, when you have data from several cell types, you probably
> don't have one particular cell type as a control group to all other cell
> types. I had the similar problem in the mammary gland cell type data (Lim
> 2010, Nature Medicine). What I have done is to compare the cell type A to
> each of the other three cell types, then get the overlapped up (or down)
> regulated genes in the three comparisons. These genes are the signature
> genes (expressed genes or lower-expressed genes) for the cell type A. The
> same thing can be done for the other cell types.
>
> Regarding gene set tests,  testing which pathways, GO terms or other gene
> lists are enriched in your gene list,  there are different ways. Some
> required the raw data (our "roast" and "romer" functions in limma among
> others ). The geneSetTest function in limma only used the ranks of genes.
>
> I will be happy to discuss with you more about gene set tests if that is
> actually what you face to or if you need to use them later.
>
> Hope this help,
> Di
>
>
> ----
> Di Wu
> Postdoctoral fellow
> Harvard University, Statistics Department
> Harvard Medical School
> Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA
>
> ________________________________________
> From: bioconductor-bounces at r-project.org [
> bioconductor-bounces at r-project.org] On Behalf Of Wendy Qiao [
> wendy2.qiao at gmail.com]
> Sent: Friday, November 04, 2011 5:01 PM
> To: bioconductor at r-project.org
> Subject: [BioC] gene enrichment analysis without a control sample
>
> Hi all,
>
> I have a microarray dataset compiled from several sources, so I am facing
> some challenges with identifying the expressed genes of each cell type. I
> am thinking to use the enriched gene sets of each cell type as the
> expressed genes of that cell type. However, the gene set enrichment
> analysis (http://www.broadinstitute.org/gsea/index.jsp) needs both control
> and sample data. I am wondering if there is gene set enrichment tool for
> the analysis of one cell type only.
>
> Thank you, Wendy
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list