[BioC] gene set enrichment

Gordon K Smyth smyth at wehi.EDU.AU
Tue Dec 4 00:42:58 CET 2012


Hi Steve,

Thanks for correcting me.

I said that GSEA requires full data because this is true of the published 
GSEA algorithm (Subramanian et al 2005).  The published GSEA approach 
permutes arrays and therefore requires all the data.  I just forgot that 
the GSEA software provides an alternative short-cut approach (permuting 
genes) that can be used when there are no replicates or one just has a 
ranked gene list.

The GSEA ranked gene list approach is similar in principle to the 
geneSetTest() function in the limma package.  This approach has the 
disadvantage that it does not correct for intra-gene corrrelations, as we 
pointed out in our recent camera paper (thanks to Tim Triche for giving 
the reference).

However the same criticism (that intra-gene correlation is ignored) can be 
made of all GO overlap analysis softwares as well including goseq.  So the 
only clear advantage of goseq over GSEA here is the adjustment for gene 
length.  As compensation, GSEA-ranked-list uses the rankings of the DE 
genes that goseq ignores.

As you probably know, the whole area of gene set testing is a hot area of 
research, and the inter-relationships between the many different 
approaches are still imperfectly understood.  Methods like geneSetTest and 
GSEA-ranked-list are anti-conservative.  Methods like roast, camera or 
classic GSEA are conservative and safe.  GO overlap analyses like goseq, 
GOStat, DAVID etc are anti-conservative in principle but, in practice, 
multiple testing conservatism tends to make them conservative.  Different 
approaches test different hypotheses and emphasise different aspects of 
the data.

Best wishes
Gordon

On Sun, 2 Dec 2012, Steve Lianoglou wrote:

> Hi Gordon,
>
> When an expert comments on a topic I'm interested in, it's hard for me
> not to press for more insight so I hope you don't mind, but also ...
> you know .. take your time :-)
>
> On Sat, Dec 1, 2012 at 8:39 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> [snip]
>> The term "gene set enrichment analysis" was coined by the Broad Institute:
>>
>>   http://www.broadinstitute.org/gsea/
>>
>> but you certainly can't simply give a list of genes to GSEA.  It requires
>> complete data and is designed for microarrays rather than RNA-Seq anyway.
>
> I'm curious if you say so because GSEA doesn't account for something
> like length bias? The GSEA folks seem to suggest that one could do
> this like any other "pre-processed" GSEA analysis by simply providing
> a ranked list of genes (presumably by fold-change):
>
> http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#Can_I_use_GSEA_to_analyze_SNP.2C_SAGE.2C_ChIP-Seq_or_RNA-Seq_data.3F
>
> Would you mind (briefly) elaborating a bit on why you disagree?
>
> Thanks,
> -steve
>
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
> | Memorial Sloan-Kettering Cancer Center
> | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list