[BioC] GSEA and edgeR

Sean Davis sdavis2 at mail.nih.gov
Thu Sep 1 17:11:42 CEST 2011


Hi, Iain.

Before proceeding, you might want to take a look at:

http://bioconductor.org/packages/release/bioc/html/goseq.html

This paper and others like it describe the problem and another solution:

http://www.ncbi.nlm.nih.gov/pubmed/21252076

Sean


On Thu, Sep 1, 2011 at 9:13 AM, Iain Gallagher
<iaingallagher at btopenworld.com> wrote:
> Dear List / Gordon
>
> I would like to carry out a GSEA analysis on some RNA-Seq data. I have analysed the data using edgeR and have a list of regulated genes. The data is paired - 6 biological samples, cells from each are infected or uninfected.
>
> Looking around there is little advice on the use of GSEA in RNA-Seq data. Using edgeR (and having a relatively small sample size) I was hoping to make use of the romer algorithm which is implemented in limma. However I am having a conceptual problem setting up my data for this.
>
> As the study uses paired sample data I have followed the guidance for this type of analysis in the marvellous edgeR manual. Searching for advice on how to conduct GSEA I came across this post (http://www.mail-archive.com/bioc-sig-sequencing@r-project.org/msg02020.html) by Gordon Smyth (author of edgeR / limma) wherein he describes a suitable strategy for edgeR and the rotational GSEA algorithms.
>
> These algorithms require the specification of a contrast (in my case between infected and uninfected animals).
>
> e.g. test <- romer(geneIndex, countData, design=design, contrast=contr[,1], nrot=9999)
>
> My design matrix looks like this:
>
> library(limma)
>
> samples <- rep(c('s1', 's2', 's3', 's4', 's5', 's6'),2)
> infection <- c(rep(1,6), rep(2,6))
>
> s <- as.factor(samples)
> i <- as.factor(infection)
>
> design <- model.matrix(~s+i)
> colnames(design)[7] <- 'infection'
>
>> design
>   (Intercept) ss2 ss3 ss4 ss5 ss6 infection
> 1            1   0   0   0   0   0         0
> 2            1   1   0   0   0   0         0
> 3            1   0   1   0   0   0         0
> 4            1   0   0   1   0   0         0
> 5            1   0   0   0   1   0         0
> 6            1   0   0   0   0   1         0
> 7            1   0   0   0   0   0         1
> 8            1   1   0   0   0   0         1
> 9            1   0   1   0   0   0         1
> 10           1   0   0   1   0   0         1
> 11           1   0   0   0   1   0         1
> 12           1   0   0   0   0   1         1
> attr(,"assign")
> [1] 0 1 1 1 1 1 2
> attr(,"contrasts")
> attr(,"contrasts")$s
> [1] "contr.treatment"
>
> attr(,"contrasts")$i
> [1] "contr.treatment"
>
> and in edgeR I carry out the following fit of a GLM to get the genes regulated between infected and uninfected animals i.e. fit a GLM for infection and no infection then edgeR carries out likeliehood tests to find the genes where the two models differ (I think!).
>
> #dispersion estimate
> d <- estimateGLMCommonDisp(d, design)
>
> #fit the NB GLM for each gene
> fitFiltered <- glmFit(d, design, dispersion = d$common.dispersion)
>
> #carry out the likliehood ratio test
> lrtFiltered <- glmLRT(d, fitFiltered, coef = 7)
>
> #how many genes are DE?
> summary(decideTestsDGE(lrtFiltered))#DE genes
>
> So (finally) my question is how do I set up a contrast (whilst keeping the pairing) for romer?
>
> Thanks
>
> Best
>
> Iain
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C
>  [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8
>  [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8
>  [7] LC_PAPER=en_GB.utf8       LC_NAME=C
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>  [1] annotate_1.30.0        GO.db_2.5.0            goseq_1.4.0
>  [4] geneLenDataBase_0.99.7 BiasedUrn_1.04         org.Bt.eg.db_2.5.0
>  [7] RSQLite_0.9-4          DBI_0.2-5              AnnotationDbi_1.14.1
> [10] Biobase_2.12.2         edgeR_2.2.5            limma_3.8.3
>
> loaded via a namespace (and not attached):
>  [1] biomaRt_2.8.1         Biostrings_2.20.3     BSgenome_1.20.0
>  [4] GenomicFeatures_1.4.4 GenomicRanges_1.4.8   grid_2.13.1
>  [7] IRanges_1.10.6        lattice_0.19-33       Matrix_0.9996875-3
> [10] mgcv_1.7-6            nlme_3.1-102          RCurl_1.6-9
> [13] rtracklayer_1.12.4    tools_2.13.1          XML_3.4-2
> [16] xtable_1.5-6
>>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list