[BioC] geneSetTest() / GESA

Gordon Smyth smyth at wehi.EDU.AU
Tue Mar 6 00:38:08 CET 2007


Dear Simon,

geneSetTest() is very fast if you use the default settings. In that 
case it's a closed form calculation. It's intended to use with 
individual gene sets and has no problem with small gene sets. It's 
usable down to size=1.

GSEA and especially GSA are very sophisticated methods which use 
permutation over arrays as well as standardization over genes to 
control for possible dependence between the genes in the test set. 
I'm not an expert on either method, but they seem intended for 
two-sample situations with at least half a dozen arrays in each 
group, many gene sets, and many genes in each set.

geneSetTest() is a far simpler (hence more flexible) approach which 
is aimed at a class of problems that we see regularly at the WEHI. 
Here the aim is to relate a gene ranking, usually achieved by fitting 
a linear model, to a prior set of genes of special interest. It's 
based on permuting the genes, not the arrays. The default method is 
simply a Wilcoxon test using the ranks of the genes. The caveat of 
geneSetTest() is that significance can arise theoretically from high 
correlations between genes in the test set rather than a shift in the 
mean, so this possibility should ideally be checked or ruled out separately.

Best wishes
Gordon

At 10:00 PM 5/03/2007, bioconductor-request at stat.math.ethz.ch wrote:
>Date: Sun, 4 Mar 2007 12:46:19 -0600
>From: "Simon Lin" <simonlin at duke.edu>
>Subject: Re: [BioC] geneSetTest() / GESA
>To: <bioconductor at stat.math.ethz.ch>
>
>Dear Gordon,
>
>Is the geneSetTest() fast to calculate? Not sure if you used permutation
>test under the hood.
>
>For GSEA and GSA, sometimes we see artifacts when the size of the set is too
>small. Is the same true for geneSetTest?
>
>Thanks!
>
>Simon
>
>
>Date: Sun, 04 Mar 2007 18:51:00 +1100
>From: Gordon Smyth <smyth at wehi.EDU.AU>
>Subject: [BioC]  GSEA with one class metaanalysis
>To: Mark W Kimpel <mwkimpel at gmail.com>
>Cc: bioconductor at stat.math.ethz.ch
>Message-ID: <6.2.5.6.1.20070304184303.0242d7a0 at wehi.edu.au>
>Content-Type: text/plain; charset="us-ascii"; format=flowed
>
>Dear Mark,
>
>If I understand your problem correctly, neither GSEA nor GSA will
>accomodate it. The only option I know of is geneSetTest() in the
>limma package. This generally works well, although it will give you
>someone over optimistic p-values if there are strong positive
>correlations between the genes in your gene sets.
>
>Best wishes
>Gordon



More information about the Bioconductor mailing list