[BioC] GSEA to discover co-regulated genes

Paul Geeleher paulgeeleher at gmail.com
Mon Feb 2 17:16:01 CET 2009


Hi Vincent,

Thank you for your response. Just letting you know that your advice
was very useful and that with a few minor adjustments I was able to
perform a similar analysis to that I had performed using the KEGG
pathways.

-Paul.

On Wed, Jan 21, 2009 at 4:00 PM, Vincent Carey
<stvjc at channing.harvard.edu> wrote:
>
> On Wed, Jan 21, 2009 at 6:56 AM, Paul Geeleher <paulgeeleher at gmail.com>
> wrote:
>>
>> Hi All,
>>
>> I've been following the instructions here:
>>
>>
>> http://www.bioconductor.org/workshops/2007/seattle_bioc_intro_nov_07/folder.2007-11-30.5595085375/
>>
>> to find dysregulated kegg pathways in a dataset. What I'm now
>> wondering is if I can use the same methodology to find co-regulated
>> genes / genes with common transcription factors?
>>
>> I'd assume its simply of redefining the gene set
>>
>> gsc <- GeneSetCollection(eset, setType = KEGGCollection())
>> to
>> gsc <- GeneSetCollection(eset, setType =
>> CoRegulatedGenesOrSomeFunctionLikeThat())
>>
>>
>> I suppose what I'm asking is if such a gene set exists in
>> Bioconductor? And if not can this be done somewhere else?
>
> GSEABase has infrastructure to import the Broad MSIGDB from its XML
> serialization;
> see http://www.broad.mit.edu/gsea/downloads.jsp, where you will need to
> register.
>
> If you use getBroadSets() in GSEABase to import the entire MSIGDB you will
> have access to
> 5452 gene sets.  Broad categorizes these in five groups; group c3 includes
> motif gene sets
> which includes a subclass called transcription factor targets.
>
> Digging through a GSEABase GeneSetCollection can proceed in various ways.
> What I will
> show is probably not the most elegant approach:
>
> Assume you have imported the whole MSIGDB as msig2.5
>
>> isC3 = which(sapply(msig2.5, function(x)bcCategory(collectionType(x))) ==
>> "c3")
>> C3coll = msig2.5[isC3]
>> C3coll
> GeneSetCollection
>   names: RGAGGAARY_V$PU1_Q6, KRCTCNNNNMANAGC_UNKNOWN, ..., GTTATAT,MIR-410
> (837 total)
>   unique identifiers: PCDHGA5, CTXL, ..., pp9099 (15718 total)
>   types in collection:
>     geneIdType: SymbolIdentifier (1 total)
>     collectionType: BroadCollection (1 total)
>> C3coll[[1]]
> setName: RGAGGAARY_V$PU1_Q6
> geneIds: PCDHGA5, CTXL, ..., HCMOGT-1 (total: 522)
> geneIdType: Symbol
> collectionType: Broad
>   bcCategory: c3 (Motif)
>   bcSubCategory:  NA
> details: use 'details(object)'
>> details(C3coll[[1]])
> setName: RGAGGAARY_V$PU1_Q6
> geneIds: PCDHGA5, CTXL, ..., HCMOGT-1 (total: 522)
> geneIdType: Symbol
> collectionType: Broad
>   bcCategory: c3 (Motif)
>   bcSubCategory:  NA
> setIdentifier: c3:261
> description: Genes with promoter regions [-2kb,2kb] around transcription
> start site containing the
> motif RGAGGAARY which matches annotation for SPI1: spleen focus forming
> virus (SFFV) proviral integ
> ration oncogene spi1
>   (longDescription available)
> organism: Human,Mouse,Rat,Dog
> pubMedIds:
> urls: msigdb_v2.5.xml
> contributor: Xiaohui Xie
> setVersion: 0.0.1
> creationDate: Thu Jul 10 16:59:23 2008
>
> invocation of the longDescription method against C3coll[[1]] leads
> to an interesting structure that will need to be parsed -- seems to be
> in a marked up medline format.
>
> once you have found the gene sets you are interested in, GSEABase
> contains additional infrastructure to convert the identifiers for
> genes used in MSIGDB to array probe set identifiers or entrez identifiers,
> etc.
>
>
>>
>> Thanks.
>>
>> --
>> Paul Geeleher
>> Department of Mathematics
>> National University of Ireland
>> Galway
>> Ireland
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>



-- 
Paul Geeleher
School of Mathematics, Statistics and Applied Mathematics
National University of Ireland
Galway
Ireland



More information about the Bioconductor mailing list