[BioC] GSEABase error in parsing msigdb_v2.5.xml

Martin Morgan mtmorgan at fhcrc.org
Thu May 15 19:46:26 CEST 2008


Thanks Vladimir for the report, more below...

"Vladimir Morozov" <vmorozov at als.net> writes:

> Hi,
>  
> I get error reading the last vesrsion of Broad msigdb . Is it supposed
> to work?
>  
>> gss <- getBroadSets('/data/PathDB/msigdb_v2.5.xml')
> Error: 'getBroadSets' failed to create gene sets:
>   invalid BroadCollection category: 'c5'

The Broad added a category; I've updated GSEABase in both the devel
and release branches. The update should be available with biocLite
after 12 noon Friday; look for GSEABase 1.2.1 in the release.

One aspect that is a little unsatisfactory is that the subcategories
(CC/ BP/MF for c5, for instance) are not encoded in the XML, and so
are not present in the gene sets.

Martin

>> traceback()
> 6: stop("'getBroadSets' failed to create gene sets:\n  ",
> conditionMessage(err), 
>        call. = FALSE)
> 5: value[[3]](cond)
> 4: tryCatchOne(expr, names, parentenv, handlers[[1]])
> 3: tryCatchList(expr, classes, parentenv, handlers)
> 2: tryCatch({
>        geneSets <- unlist(mapply(.fromXML, uri, "//GENESET", factories, 
>            SIMPLIFY = FALSE, USE.NAMES = FALSE))
>    }, error = function(err) {
>        stop("'getBroadSets' failed to create gene sets:\n  ",
> conditionMessage(err), 
>            call. = FALSE)
>    })
> 1: getBroadSets("/data/PathDB/msigdb_v2.5.xml")
>> packageDescription('GSEABase')
> Package: GSEABase
> Type: Package
> Title: Gene set enrichment data structures and methods
> Version: 1.2.0
> Author: Martin Morgan, Seth Falcon, Robert Gentleman
> Maintainer: Biocore Team c/o BioC user list
>         <bioconductor at stat.math.ethz.ch>
> Description: This package provides classes and methods to support Gene
>         Set Enrichment Analysis (GSEA).
> License: Artistic-2.0
> Depends: R (>= 2.6.0), methods, AnnotationDbi, Biobase, annotate
> Suggests: Ruuid, hgu95av2.db, GO.db, org.Hs.eg.db
> Imports: methods, XML, graph
> LazyLoad: yes
> biocViews: Infrastructure, Statistics
> Collate: utilities.R AAA.R AllClasses.R AllGenerics.R getObjects.R
>         methods-CollectionType.R methods-ExpressionSet.R
>         methods-GeneColorSet.R methods-GeneIdentifierType.R
>         methods-GeneSet.R methods-GeneSetCollection.R
>         methods-OBOCollection.R zzz.R
> Packaged: Wed Apr 30 02:43:40 2008; biocbuild
> Built: R 2.7.0; ; 2008-05-14 16:18:51; unix
>  
> -- File: /usr/local/lib64/R/library/GSEABase/Meta/package.rds 
>  
>  
> Althogh
> getBroadSets('/data/PathDB/msigdb_v2.1.xml')
> works. I don's see obvios signs of corruption in the 2.5.xml
> [rstats:GeneLogic070523] head -n 2 /data/PathDB/*.xml
> ==> /data/PathDB/msigdb_v2.1.xml <==
> <?xml version="1.0" encoding="UTF-8"?>
>  
>
> ==> /data/PathDB/msigdb_v2.5.xml <==
> <?xml version="1.0" encoding="UTF-8"?>
>
> tail -n 2 /data/PathDB/*.xml
> ==> /data/PathDB/msigdb_v2.1.xml <==
>   <GENESET STANDARD_NAME="GNF2_ZAP70" SYSTEMATIC_NAME="c4:526"
> ORGANISM="Human" CHIP="GENE_SYMBOL" CATEGORY_CODE="c4"
> CONTRIBUTOR="Broad Institute" CONTRIBUTOR_ORG="Broad Institute"
> DESCRIPTION_BRIEF="Neighborhood of ZAP70" DESCRIPTION_FULL="Neighborhood
> of ZAP70 zeta-chain (TCR) associated protein kinase 70kDa  in the GNF2
> expression compendium" TAGS=""
> MEMBERS="ZAP70,PTPN4,UNC84B,TUSC4,CTSW,RARRES3,BTN3A2,NKG7,PRKCH,KLRK1,B
> TN3A3,MYBL1,GZMA,ARL4C,SH2D1A,TXK,CD7,RORA,CD247,IL18RAP,CD96,RASGRP1,GZ
> MM,TRD@,MATK,ITGAL,KLRB1"
> MEMBERS_SYMBOLIZED="ZAP70,PTPN4,UNC84B,TUSC4,CTSW,RARRES3,BTN3A2,NKG7,PR
> KCH,KLRK1,BTN3A3,MYBL1,GZMA,ARL4C,SH2D1A,TXK,CD7,RORA,CD247,IL18RAP,CD96
> ,RASGRP1,GZMM,TRD@,MATK,ITGAL,KLRB1"/>
> </MSIGDB>
>  
> ==> /data/PathDB/msigdb_v2.5.xml <==
>   <GENESET
> STANDARD_NAME="INOSITOL_OR_PHOSPHATIDYLINOSITOL_KINASE_ACTIVITY"
> SYSTEMATIC_NAME="c5:1203" ORGANISM="Homo sapiens" AUTHORS="Ashburner M,
> Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski
> K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A,
> Lewis S, Matese JC,Richardson JE, Ringwald M, Rubin GM, Sherlock G."
> EXTERNAL_DETAILS_URL="http://amigo.geneontology.org/cgi-bin/amigo/go.cgi
> ?view=details&amp;search_constraint=terms&amp;depth=0&amp;query=GO:00044
> 28" CHIP="GENE_SYMBOL" CATEGORY_CODE="c5" CONTRIBUTOR="Gene Ontology"
> CONTRIBUTOR_ORG="Gene Ontology" DESCRIPTION_BRIEF="Genes annotated by
> the GO term GO:0004428. Catalysis of the phosphorylation of myo-inositol
> (1,2,3,5/4,6-cyclohexanehexol) or a phosphatidylinositol."
> DESCRIPTION_FULL="" TAGS="Molecular function"
> MEMBERS="FXN,SMG1,PIP4K2B,PIP5K3,ATM,PIK3C2A,PIK3C3,PIK3CA,PIK3CB,PIK3CG
> ,PIK3R2,PIK3R3,IPPK,PI4KA,PI4KB,PI4K2A,ITPKA,ITPKB"
> MEMBERS_SYMBOLIZED="FXN,SMG1,PIP4K2B,PIP5K3,ATM,PIK3C2A,PIK3C3,PIK3CA,PI
> K3CB,PIK3CG,PIK3R2,PIK3R3,IPPK,PI4KA,PI4KB,PI4K2A,ITPKA,ITPKB"/>
> </MSIGDB>
>  
>  
>  
> Best
> Vlad
>  
>  
>
> Vladimir Morozov 
>
> ALS Therapy Development Institute 
>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list