[BioC] GSEABase and Broad Inst Sets

Martin Morgan mtmorgan at fhcrc.org
Tue Jul 6 15:43:03 CEST 2010


On 07/06/2010 04:32 AM, Iain Gallagher wrote:
> Hi List
> 
> I'm trying to carry out a GSEA analysis on an ExpressionSet object using GSEABase and the Broad Institute genesets (well the C2 subset, specifically).
> 
> library(GSEABase)
> 
> broadSets <- getBroadSets("/home/iain/Desktop/prostateProjectJN_GS/CEL/msigdb_v2.5.xml")# file downloaded from Broad site
> 
> isC2 <- sapply(broadSets, function(x) bcCategory(collectionType(x))) == "c2"
> 
> broadSetsC2<-broadSets[isC2]
> 
> relevantArrays <- grep('Hypo.No.None|Norm.No.None', TS)
> 
> relevantArrays <- rmaDataFiltered[ ,relevantArrays]
> 
> So this get me to the point where I have my expression data and the genesets I want. This is where I'm having trouble. Following the GSEABase tutorials with KEGG annotation I have no problems; but I can't calculate an incidence matrix from my expression data using the Broad genesets I have downloaded. 
> 
> i.e.
> 
> testGSC <- GeneSetCollection(relevantArrays, setType=BroadCollection())
> Error in get(mapName, envir = pkgEnv, inherits = FALSE) : 
>   object 'hgu133plus2BROAD' not found
> Error in revmap(getAnnMap(toupper(collectionType(setType)), annotation(idType))) : 
>   error in evaluating the argument 'x' in selecting a method for function 'revmap'
> 
> 

> This is a mapping issue I know but I'm having a conceptual block
getting over it. If anyone could offer any help I'd be grateful.

For a reproducible example, after

  library(GSEABase)
  example(getBroadSets)
  data(sample.ExpressionSet)
  eset = sample.ExpressionSet  # less typing!

If you're interested in creating a GeneSetCollection that contains just
those symbols that are relevant to your ExpressionSet 'eset' then

  gss1 = mapIdentifiers(gss, AnnotationIdentifier(annotation(eset)))

Subsetting eset might look like

  idx = featureNames(eset) %in% unlist(geneIds(gss1), use.names=FALSE)
  eset[idx,]

In answering this question, I realized that getBroadSets does not
correctly interpret the identifiers as 'Symbols'; until this is fixed in
GSEABase, you should

  library(limma)
  sids <- lapply(geneIds(gss), alias2Symbol, "Hs", TRUE)
  gss = GeneSetCollection(mapply("geneIds<-", gss, sids))

Martin

> 
> iain
> 
>> sessionInfo()
> R version 2.10.1 (2009-12-14) 
> x86_64-pc-linux-gnu 
> 
> locale:
>  [1] LC_CTYPE=en_GB.utf8       LC_NUMERIC=C             
>  [3] LC_TIME=en_GB.utf8        LC_COLLATE=en_GB.utf8    
>  [5] LC_MONETARY=C             LC_MESSAGES=en_GB.utf8   
>  [7] LC_PAPER=en_GB.utf8       LC_NAME=C                
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C           
> [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C      
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
>  [1] affyQCReport_1.24.0  affyPLM_1.22.0       preprocessCore_1.8.0
>  [4] xtable_1.5-6         simpleaffy_2.22.0    gcrma_2.18.1        
>  [7] latticeExtra_0.6-11  lattice_0.18-3       RColorBrewer_1.0-2  
> [10] hgu133plus2.db_2.3.5 hgu133plus2cdf_2.5.0 affy_1.24.2         
> [13] limma_3.2.3          GSEABase_1.8.0       graph_1.26.0        
> [16] annotate_1.24.1      hgu95av2.db_2.3.5    org.Hs.eg.db_2.3.6  
> [19] RSQLite_0.9-0        DBI_0.2-5            AnnotationDbi_1.8.2 
> [22] genefilter_1.28.2    ALL_1.4.7            Biobase_2.6.1       
> 
> loaded via a namespace (and not attached):
> [1] affyio_1.14.0      Biostrings_2.14.12 grid_2.10.1        IRanges_1.4.16    
> [5] splines_2.10.1     survival_2.35-8    tools_2.10.1       XML_3.1-0         
>>
> 
> 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list