[BioC] gene set enrichment analysis of RNA-Seq data

Gordon K Smyth smyth at wehi.EDU.AU
Fri Apr 13 09:20:21 CEST 2012


Dear Julie,

A good question.  As far as I know, there is as yet no such method.  What 
I am doing for this purpose for the time being is to use voom() in the 
limma package to transform the RNA-Seq counts to a scale on which 
microarray methods can be used, then using roast().  See page 104 of the 
limma User's Guide for examples of this:

http://bioconductor.org/packages/2.10/bioc/vignettes/limma/inst/doc/usersguide.pdf

Note that roast() is a self-contained gene set test with the ability to 
use linear models and weights:

   http://www.ncbi.nlm.nih.gov/pubmed/20610611

Another gene set enrichment option that works fine with RNA-Seq data is 
camera().  This is a competitive test, but without the usual disadvantage 
of gene sampling in that it estimates and adjusts for inter-gene 
correlation.  camera() is currently setup to automatically use the weights 
that come out of voom(), meaning that camera() respects the mean-variance 
relationship of RNA-Seq data.  We have used it successfully on RNA-Seq 
data.

Best wishes
Gordon

------------ original message ------------------
[BioC] gene set enrichment analysis of RNA-Seq data
Julie Leonard julie.leonard at syngenta.com
Thu Apr 12 23:06:54 CEST 2012

I was wondering if anyone is aware of a gene
set enrichment algorithm for RNA-Seq data that:

1) does not require a specification of differentially
expressed (DE) genes (i.e.no need to use a hard
p-value threshold cutoff for determining the DE gene
list)

2) uses subject sampling instead of gene sampling
to obtain the p-value (i.e.this would maintain
gene-gene correlations)

Basically, I'm looking for a
self-contained/subject sampling method (e.g.
SAM-GS for microarray data) or a "hybrid" method
(e.g. GSEA for microarray data).  The only gene set
enrichment algorithm that I am aware of for RNA-Seq
data is GOSeq, but it uses a competitive/gene
sampling method (i.e. Fisher's Exact Test).
Note, the ideas of self-contained vs competitive and
subject sampling vs gene sampling come from the
following paper:  Goeman JJ, Bhlmann P.Analyzing
gene expression data in terms of gene sets:
methodological issues. Bioinformatics. 2007 Apr 15;23(8)

Something like GSEA-SNP is close to what I want.
It uses a test-statistic that is suitable for discrete data
and uses subject sampling to calculate the p-values.

Thanks,
Julie


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list