[BioC] GOstats minus IEA?

Loren Engrav engrav at u.washington.edu
Tue Apr 27 18:15:28 CEST 2010


Thank you

I gave GOstats the entrez IDs directly and that solved the NA problem
Somehow extracting them from hgu133plus2ENTREZID was problematic with funny
NAs

Then used your code to fish out non-IEA

So it works, thank you




From: Vincent Carey <stvjc at channing.harvard.edu>
Date: Mon, 26 Apr 2010 21:55:12 -0400
To: Loren Engrav <engrav at u.washington.edu>
Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
Subject: Re: [BioC] GOstats minus IEA?



On Mon, Apr 26, 2010 at 9:26 PM, Loren Engrav <engrav at u.washington.edu>
wrote:
> Thank you, looks clever
> Am working thru it but am stuck
> 
>> GOstatsentrezUniverse <- unlist(mget(featureNames(GOstats1842v2exprs),
> hgu133plus2ENTREZID))
>> GOstatsentrezSelected <- unlist(mget(featureNames(GOstats153v2exprs),
> hgu133plus2ENTREZID))
>> GOstats_params_BP.001over <- new("GOHyperGParams", geneIds =
> GOstatsentrezSelected, universeGeneIds = GOstatsentrezUniverse, annotation =
> "hgu133plus2.db", ontology = "BP", pvalueCutoff = .001, conditional = FALSE,
> testDirection = "over")
> Warning messages:
> 1: In makeValidParams(.Object) : removing duplicate IDs in geneIds
> 2: In makeValidParams(.Object) : removing duplicate IDs in universeGeneIds
>> ids <- GOstats_params_BP.001over at geneIds
>> gids = mget(ids, org.Hs.egGO)
> Error in .checkKeysAreWellFormed(keys) :
>   keys must be supplied in a character vector with no NAs
> 

try any(is.na <http://is.na> (ids)) -- if this is TRUE you will need to do
something like
mget(na.omit(ids), ...)

if it is not TRUE then you will have to send some exemplars from gids for
diagnosis
 
> How do I unstick gids?
> 
> =========================================
> 
> 
>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-apple-darwin9.8.0
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] grid      tools     stats     graphics  grDevices utils     datasets
> methods   base
> 
> other attached packages:
>  [1] codetools_0.2-2      genefilter_1.30.0    RColorBrewer_1.0-2
> xtable_1.5-6         Rgraphviz_1.26.0
>  [6] GO.db_2.4.1          hgu133plus2.db_2.4.1 org.Hs.eg.db_2.4.1
> annotate_1.26.0      GOstats_2.14.0
> [11] RSQLite_0.8-4        DBI_0.2-5            graph_1.26.0
> Category_2.14.0      AnnotationDbi_1.10.0
> [16] Biobase_2.8.0
> 
> loaded via a namespace (and not attached):
> [1] GSEABase_1.10.0 RBGL_1.24.0     splines_2.11.0  survival_2.35-8
> XML_2.8-1
> 
> 
> 
> From: Vincent Carey <stvjc at channing.harvard.edu>
> Date: Mon, 26 Apr 2010 11:52:00 -0400
> To: Loren Engrav <engrav at u.washington.edu>
> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: Re: [BioC] GOstats minus IEA?
> 
> There does not seem to be a direct way within the GOstats tools to perform
> this kind of filtering.  However, a help.search("evidence") can find a
> function called dropECode that addresses this concern if you have the
> annotate package installed.
> 
> You would need to use it as you define your gene list and universe to
> exclude genes that have undesirable evidence profiles.  For example, if you
> run the vignette GOstatsHyperG.Rnw, an object called params will be
> created.  This includes examples of geneIds and universe vectors that are in
> fact entrez gene IDs.
> 
> Briefly, to see how dropECode can be used, consider
> 
>> Sweave("GOstatsHyperG.Rnw")
>> ids = params at geneIds
>> gids = mget(ids, org.Hs.egGO)
>> dgids = lapply(gids, dropECode)
>> table(sapply(gids,length))
> 
>  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
> 26
> 12 18 28 17 33 51 59 63 56 44 39 34 24 22 25 19 13  7  7 12  7  6  9  6  5 
> 2
> 27 28 29 30 31 32 33 34 35 36 37 39 40 41 42 43 44 45 50 54 64 65 72 79
>  2  1  3  5  3  1  5  2  1  2  1  1  1  4  1  1  1  2  1  1  1  1  1  1
> 
>> table(sapply(dgids,length))
>  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 25 26
> 27
> 91 58 77 85 89 53 41 29 20 25 12 15  7 10  1  4  8  4  2  4  3  1  1  1  6 
> 2
> 30 32 35 36 40 47
>  2  3  1  3  2  1
> 
> This shows that prior to dropECode (which by default drops terms annotated
> via IEA) there were 12 genes with a single association; subsequent to
> dropECode, 91 genes had none and 58 had only one.  Further exploration
> indicates that gene 10265 is one that has 11 associations, all of them coded
> IEA.
> 
>> sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-04-16 r51754)
> x86_64-apple-darwin10.3.0
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices datasets  tools     utils   
> [8] methods   base    
> 
> other attached packages:
>  [1] Rgraphviz_1.27.0    xtable_1.5-5        RColorBrewer_1.0-2
>  [4] GOstats_2.13.0      graph_1.25.1        Category_2.13.3   
>  [7] genefilter_1.29.2   annotate_1.25.0     GO.db_2.4.0       
> [10] hgu95av2.db_2.4.0   org.Hs.eg.db_2.4.0  RSQLite_0.8-4     
> [13] DBI_0.2-5           AnnotationDbi_1.9.8 ALL_1.4.7         
> [16] Biobase_2.7.6       weaver_1.13.0       codetools_0.2-2   
> [19] digest_0.4.1      
> 
> loaded via a namespace (and not attached):
> [1] GSEABase_1.9.0  RBGL_1.23.0     XML_2.6-0       splines_2.12.0
> [5] survival_2.35-8
> 
> 
> On Mon, Apr 26, 2010 at 10:54 AM, Loren Engrav <engrav at u.washington.edu>
> wrote:
>> GO.db and org.Hs.egGO2EG and manipulating content thereof was discussed a
>> month or so ago and is kind of complicated.
>> 
>> Is it possible to run GOstats and exclude IEA evidence without serious
>> custom work?
>> 
>> I searched gmane.science.biology.informatics.conductor and the 4 GOstats
>> pdfs and did not hit upon anything.
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list