[BioC] GOstats minus IEA?

Loren Engrav engrav at u.washington.edu
Tue Apr 27 03:26:38 CEST 2010


Thank you, looks clever
Am working thru it but am stuck

> GOstatsentrezUniverse <- unlist(mget(featureNames(GOstats1842v2exprs),
hgu133plus2ENTREZID))
> GOstatsentrezSelected <- unlist(mget(featureNames(GOstats153v2exprs),
hgu133plus2ENTREZID))
> GOstats_params_BP.001over <- new("GOHyperGParams", geneIds =
GOstatsentrezSelected, universeGeneIds = GOstatsentrezUniverse, annotation =
"hgu133plus2.db", ontology = "BP", pvalueCutoff = .001, conditional = FALSE,
testDirection = "over")
Warning messages:
1: In makeValidParams(.Object) : removing duplicate IDs in geneIds
2: In makeValidParams(.Object) : removing duplicate IDs in universeGeneIds
> ids <- GOstats_params_BP.001over at geneIds
> gids = mget(ids, org.Hs.egGO)
Error in .checkKeysAreWellFormed(keys) :
  keys must be supplied in a character vector with no NAs

How do I unstick gids?

=========================================


> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] grid      tools     stats     graphics  grDevices utils     datasets
methods   base     

other attached packages:
 [1] codetools_0.2-2      genefilter_1.30.0    RColorBrewer_1.0-2
xtable_1.5-6         Rgraphviz_1.26.0
 [6] GO.db_2.4.1          hgu133plus2.db_2.4.1 org.Hs.eg.db_2.4.1
annotate_1.26.0      GOstats_2.14.0
[11] RSQLite_0.8-4        DBI_0.2-5            graph_1.26.0
Category_2.14.0      AnnotationDbi_1.10.0
[16] Biobase_2.8.0 

loaded via a namespace (and not attached):
[1] GSEABase_1.10.0 RBGL_1.24.0     splines_2.11.0  survival_2.35-8
XML_2.8-1  



From: Vincent Carey <stvjc at channing.harvard.edu>
Date: Mon, 26 Apr 2010 11:52:00 -0400
To: Loren Engrav <engrav at u.washington.edu>
Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] GOstats minus IEA?

There does not seem to be a direct way within the GOstats tools to perform
this kind of filtering.  However, a help.search("evidence") can find a
function called dropECode that addresses this concern if you have the
annotate package installed.

You would need to use it as you define your gene list and universe to
exclude genes that have undesirable evidence profiles.  For example, if you
run the vignette GOstatsHyperG.Rnw, an object called params will be
created.  This includes examples of geneIds and universe vectors that are in
fact entrez gene IDs.

Briefly, to see how dropECode can be used, consider

> Sweave("GOstatsHyperG.Rnw")
> ids = params at geneIds
> gids = mget(ids, org.Hs.egGO)
> dgids = lapply(gids, dropECode)
> table(sapply(gids,length))

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 
12 18 28 17 33 51 59 63 56 44 39 34 24 22 25 19 13  7  7 12  7  6  9  6  5 
2 
27 28 29 30 31 32 33 34 35 36 37 39 40 41 42 43 44 45 50 54 64 65 72 79
 2  1  3  5  3  1  5  2  1  2  1  1  1  4  1  1  1  2  1  1  1  1  1  1

> table(sapply(dgids,length))
 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 26
27 
91 58 77 85 89 53 41 29 20 25 12 15  7 10  1  4  8  4  2  4  3  1  1  1  6 
2 
30 32 35 36 40 47 
 2  3  1  3  2  1 

This shows that prior to dropECode (which by default drops terms annotated
via IEA) there were 12 genes with a single association; subsequent to
dropECode, 91 genes had none and 58 had only one.  Further exploration
indicates that gene 10265 is one that has 11 associations, all of them coded
IEA.

> sessionInfo()
R version 2.12.0 Under development (unstable) (2010-04-16 r51754)
x86_64-apple-darwin10.3.0

locale:
[1] C

attached base packages:
[1] grid      stats     graphics  grDevices datasets  tools     utils   
[8] methods   base    

other attached packages:
 [1] Rgraphviz_1.27.0    xtable_1.5-5        RColorBrewer_1.0-2
 [4] GOstats_2.13.0      graph_1.25.1        Category_2.13.3   
 [7] genefilter_1.29.2   annotate_1.25.0     GO.db_2.4.0       
[10] hgu95av2.db_2.4.0   org.Hs.eg.db_2.4.0  RSQLite_0.8-4     
[13] DBI_0.2-5           AnnotationDbi_1.9.8 ALL_1.4.7         
[16] Biobase_2.7.6       weaver_1.13.0       codetools_0.2-2   
[19] digest_0.4.1      

loaded via a namespace (and not attached):
[1] GSEABase_1.9.0  RBGL_1.23.0     XML_2.6-0       splines_2.12.0
[5] survival_2.35-8


On Mon, Apr 26, 2010 at 10:54 AM, Loren Engrav <engrav at u.washington.edu>
wrote:
> GO.db and org.Hs.egGO2EG and manipulating content thereof was discussed a
> month or so ago and is kind of complicated.
> 
> Is it possible to run GOstats and exclude IEA evidence without serious
> custom work?
> 
> I searched gmane.science.biology.informatics.conductor and the 4 GOstats
> pdfs and did not hit upon anything.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list