[BioC] species in MsigDB of GSEA

Martin Morgan mtmorgan at fhcrc.org
Wed Jul 16 07:47:30 CEST 2008


"Di Wu" <di.wu at med.monash.edu.au> writes:

> Thank you, Martin.
> That's what I need. I have a follow-up basic question.
> How can I transform "collectionType"  to  character, such as "C2", in case I
> only want to play with the sets from C2.

I'm not quite sure what you're asking, but something like this

> is_c2 <- sapply(gss, function(gs) bcCategory(collectionType(gs))=="c2")

gives you a logical vector which is TRUE when the bcCategory of the
collectionType of each gene set in gss is "c2". You can then

> c2sets <- gss[is_c2]

to get just those gene sets belonging to c2 (I'm using hints from the
display of the gene set to guess at how to get parts of it out, e.g.,

> gss[[1]]
[...]
collectionType: Broad
  bcCategory: c1 (Positional)
  bcSubCategory:  NA
details: use 'details(object)'

suggests that I can use collectionType on gss[[1]], and bcCategory on
the result of collectionType; I could also look in the help page,
e.g., for GeneSet-class and BroadCollection-class).

Also maybe worth pointing out that gene set collections can be subset
by their set names, e.g.,

> details(gss[["KENNY_WNT_UP"]])
setName: KENNY_WNT_UP 
geneIds: CUGBP2, ARFGEF2, ..., CASKIN2 (total: 51)
geneIdType: Symbol
collectionType: Broad
  bcCategory: c2 (Curated)
  bcSubCategory:  NA
setIdentifier: c2:803
description: Genes up-regulated by Wnt in HC11 (mammary epithelial cells)
  (longDescription available)
organism: Mouse
pubMedIds: 15642117
urls: file://home/mtmorgan/tmp/msigdb_v2.1.xml
contributor: Yujin Hoshida
setVersion: 0.0.1
creationDate: Tue Jul 15 20:31:53 2008

Hope that's on the right track for what you were looking for,

Martin

> Cheers,
> Di
>
>
> On Wed, Jul 16, 2008 at 12:49 PM, Martin Morgan <[[mtmorgan at fhcrc.org]]>
> wrote:
>
>           Hi Di --     
>
>                "Di Wu" <[[di.wu at med.monash.edu.au]]> writes:     
>      > Dear list,     >     > I am trying to use MsigDB, the gene set
>      database from GSEA. I am interested     > to know whether the sets
>      of genes are from human or mouse, particularly in     > C2.     > I
>      know I can always click the web and go deep to see how a set was
>      obtained.     > But is there any coding way to get the species
>      sources for all the gene sets     > in C2 or MsigDB.     
>      
>
>
>      If you're using the GSEABase package, then each gene set read by     getBroadSets
>      records the organism, so for example     
>      > fl <- "/path/to/msigdb_v2.1.xml"     > gss <- getBroadSets(fl) #
>      read entire msigdb     > organism(gss[[1]])     "Human"     >
>      table(sapply(gss, organism))     
>              Chimpanzee             Generic               Human          
>                 1                 456                1769     Human,Mouse,Rat,Dog
>                    Mouse                 Pig                    837      
>                248                  11                    Rat            
>       Rhesus          Zebra Fish                      3                  
>      4                   8     
>      > # retrieve a few sets from the web     > gss <-
>      getBroadSets(asBroadUri(c('chr16q', 'GNF2_ZAP70')))     >
>      organism(gss[[1]])     "Human"     
>      As a 'closer to the metal' alternative, you could use the XML
>      package     
>      > xml <- xmlTreeParse(fl, useInternal=TRUE)     > query <-
>      //GENESET[@STANDARD_NAME="KENNY_WNT_UP"]/@ORGANISM'     >
>      xpathApply(xml, query, xmlValue)     [[1]]     [1] "Mouse"     >
>      table(unlist(xpathApply(xml, "//@ORGANISM", xmlValue)))     
>              Chimpanzee             Generic               Human          
>                 1                 456                1769     Human,Mouse,Rat,Dog
>                    Mouse                 Pig                    837      
>                248                  11                    Rat            
>       Rhesus          Zebra Fish                      3                  
>      4                   8     
>      Martin     
>
>                > Appreciate your suggestions.     > Cheers,     > Di     >     
>
>
>      >       [[alternative HTML version deleted]]     >     >
>      _______________________________________________     > Bioconductor
>      mailing list     > [[Bioconductor at stat.math.ethz.ch]]     >
>      [[https://stat.ethz.ch/mailman/listinfo/bioconductor]]     > Search
>      the archives:
>      [[http://news.gmane.org/gmane.science.biology.informatics.conductor]]     
>      --     Martin Morgan     Computational Biology / Fred Hutchinson
>      Cancer Research Center     1100 Fairview Ave. N.     PO Box 19024
>      Seattle, WA 98109     
>      Location: Arnold Building M2 B169     Phone: (206) 667-2793     
>
>
>

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list