[BioC] GSEA and Broad gene sets

Martin Morgan mtmorgan at fhcrc.org
Wed Dec 5 22:44:19 CET 2007


Hi Brian --

Depends a bit on what you mean by GSEA. As Duane mentions, there's an
R script for one version at the Broad. But there are also at least the
PGSEA package, the geneSetTest in limma, and the GlobalAncova package
for doing conceptually similar things. And of course in R you can do
your own variant.

In terms of Broad sets, PGSEA will read .gmt files (see the PGSEA
vignette). There's also a function in GSEABase call getBroadSets that,
when working (see below), retrieves pre-defined gene sets from the
Broad. These sets can then be used in PGSEA, or directly in R.

A recent resource for performing GSEA-style analyses in R is from the
just-concluded Biocondcutor course (http://bioconductor.org ->
Workshops -> 2007 -> Introduction to Bioconductor -> GSEA ->
GSEA_Lecture.pdf. The main variant is that at step 3, non-specific
filtering, you'll use your Broad collections

> library(GSEABase)
> fl <- system.file("extdata", "Broad.xml", package="GSEABase")
> gsc <- getBroadSets(fl)

(not very big collection, just two sets!).

If you have identified the particular collections you'd like to work
with, then in an ideal world you'd be able to do something like

> gsc <- getBroadSets(asBroadUri(c('chr16q', 'GNF2_ZAP70'))

to retrieve these sets from the Broad website. Unfortunately, the
Broad changed their DB access to not export the XML required by
getBroadSets; they are in the process of re-enabling that export
service, and getBroadSets will work when that ability is restored. In
the mean time, you can visit the Broad site, register, and then
download and extract

http://www.broad.mit.edu/gsea/resources/files_to_download_locally_on_firewall_issues.zip

You'll then be able to

gsc <- getBroadSets("path/to/msigdb_v2.xml")

This will get you all the gene sets defined at the Broad; you'll be
able to (actually, want to) subset gsc as desired; this might be
useful anyway, as you can for instance use grep and lapply to select
gene sets based on regular expressions or other criteria (e.g., Broad
collection category). The vignette in GSEABase gives some additional
information.

Martin

"Hassane, Duane" <Duane_Hassane at urmc.rochester.edu> writes:

> The Broad Institute put out a R script, GSEA.R, which works with their gmt files and uses their KS-based enrichment score metric.
>
> http://www.broad.mit.edu/gsea/software/software_index.html
>
> Though, I have not yet specifically tried using specific BioC packages with Broad .gmt files.
>
> Not sure if that's what you're looking for.
>
> Duane Hassane
>
>
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of
> Brian_Hare at vrtx.com
> Sent: Wednesday, December 05, 2007 2:34 PM
> To: bioconductor at stat.math.ethz.ch
> Subject: [BioC] GSEA and Broad gene sets
>
>
>
>  Are there any detailed instructions available (e.g. vignettes) for how to 
> do GSEA on 
>  the Broad collection of pathways in Bioconductor?  I see bits and peices 
> - e.g. GSEAbase, 
>  geneSetTest(limma) - but haven't seen it all put together - thanks
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Dr. Martin Morgan, PhD
Computational Biology Shared Resource Director
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list