[BioC] GOstats problem with output

Robert M. Flight rflight79 at gmail.com
Fri Apr 8 14:27:02 CEST 2011


Hi Assa,

The reason you are getting no genes is that there are no genes
"directly" annotated to this term. I had the same error when I tried
to look up your GO term of interest using GO or GO2EG. you need to use
"org.Mm.egGO2ALLEGS" in this case to find the genes that are
indirectly annotated to this term via other terms. Also keep in mind
that Amigo is updated regularly, the Bioconductor packages are updated
every 6 months. This may lead to some discrepancy in the results from
Amigo and Bioconductor.

-Robert


On Fri, Apr 8, 2011 at 01:43, Assa Yeroslaviz <frymor at gmail.com> wrote:
> Well well,
> I am ashamed to say that it is now working.
>
> Apparently all I needed to do was to update the packages.
>
> I installed the new version of GO.db and GOstats
> and it is working now.
>
> Also I am still getting this error when trying to find which genes are
> attached to it.
>> mget('GO:2000021',org.Mm.egGO)
> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
>   value for "GO:2000021" not found
>> mget('GO:2000021',org.Mm.egGO2EG)
> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>   value for "GO:2000021" not found
>
> So I guess the earlier error message as nothing to do with the fact that
> there are no genes from the mouse genome mapped to this GO category
>
> When I checked in AmiGo to see if there are no genes from mouse under this
> category, I found 83 genes.
> Can anyone tell me than what's the meaning of this error?
>
> Is there a way of manually update the GO data set, so that I can map these
> genes?
>
> Thanks
> Assa
>
>> sessionInfo()
> R version 2.12.2 (2011-02-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] splines   grid      stats     graphics  grDevices utils     datasets
> [8] methods   base
>
> other attached packages:
>  [1] GSEABase_1.12.1      org.Mm.eg.db_2.4.6   biomaRt_2.6.0
>  [4] Heatplus_1.20.0      ggplot2_0.8.9        proto_0.3-9.1
>  [7] reshape_0.8.4        plyr_1.4             gplots_2.8.0
> [10] caTools_1.11         bitops_1.0-4.1       gdata_2.8.1
> [13] gtools_2.6.2         siggenes_1.24.0      multtest_2.7.1
> [16] Rgraphviz_1.29.0     xtable_1.5-6         annotate_1.28.1
> [19] GO.db_2.4.5          GOstats_2.16.0       RSQLite_0.9-4
> [22] DBI_0.2-5            graph_1.28.0         Category_2.16.0
> [25] AnnotationDbi_1.12.0 Biobase_2.10.0
>
> loaded via a namespace (and not attached):
> [1] genefilter_1.32.0 MASS_7.3-11       RBGL_1.26.0       RCurl_1.5-0
> [5] survival_2.36-5   tools_2.12.2      XML_3.2-0
>
> On Thu, Apr 7, 2011 at 18:49, Robert M. Flight <rflight79 at gmail.com> wrote:
>>
>> Hi Assa,
>>
>> As far as I am aware, if the GO term comes up in your list, then there
>> should be genes annotated to it. I did a simple test to verify that
>> the GO term does exist:
>>
>>  crud <- as.list(GOTERM)
>> > crud$'GO:2000021'
>> GOID: GO:2000021
>> Term: regulation of ion homeostasis
>> Ontology: BP
>> Definition: Any process that modulates the frequency, rate or extent
>> of ion homeostasis.
>> Synonym: regulation of electrolyte homeostasis
>> Synonym: regulation of negative regulation of crystal biosynthesis
>> Synonym: regulation of negative regulation of crystal formation
>>
>> So far so good. Now lets look to see what genes are annotated to it:
>>
>> > library(org.Mm.eg.db)
>> > mget('GO:2000021',org.Mm.egGO)
>> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
>>  value for "GO:2000021" not found
>>
>> > mget('GO:2000021',org.Mm.egGO2EG)
>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>  value for "GO:2000021" not found
>> > mget('GO:2000021',org.Mm.egGO2ALLEGS)
>> $`GO:2000021`
>>     ISO      ISO      ISO      ISO      IGI      IGI      IMP
>> IGI      ISO      ISO      IMP      ISO      ISO      IDA
>>  "11517"  "11684"  "11998"  "12000"  "12018"  "12028"  "12028"
>> "12043"  "12061"  "12257"  "12291"  "12349"  "12372"  "12389"
>>     ISO      ISO      ISO      ISO      ISO      IMP      ISO
>> ISO      IDA      IMP      IMP      IGI      IGI      ISO
>>  "12424"  "12558"  "13167"  "13489"  "13617"  "13666"  "14062"
>> "14126"  "14225"  "14225"  "14226"  "14629"  "14630"  "14652"
>>     ISO      IDA      IDA      ISO      IDA      ISO       IC
>> ISO      IMP      IMP      IDA      IMP      ISO      ISO
>>  "15171"  "15978"  "16818"  "16867"  "16963"  "17096"  "17131"
>> "18429"  "18439"  "18764"  "19264"  "20190"  "21333"  "21336"
>>     ISO      ISO      IMP      ISO      ISO      TAS      IDA
>> ISO      ISO      ISO      ISO      ISO      ISO      ISO
>>  "21803"  "21808"  "21819"  "21838"  "22041"  "22784"  "23832"
>> "24111"  "26361"  "50849"  "54140"  "76055"  "76757" "108837"
>>     ISO      IMP      ISO      ISO      IMP      ISO
>> "217369" "225908" "233081" "238276" "259277" "317757"
>>
>> BTW, this was all using GO.db_2.4.5
>>
>> From this information, there are no genes that are directly annotated
>> to your GO term, only indirect annotations. I know this doesn't help
>> your current situation, but it points towards the problem at least. I
>> thought, however, when the summary was being prepared that it used the
>> GO2ALLEGS mapping, and not the direct one. Perhaps someone more
>> knowledgeable can figure out where in the code the error is likely to
>> be?
>>
>> -Robert
>>
>> Robert M. Flight, Ph.D.
>> University of Louisville Bioinformatics Laboratory
>> University of Louisville
>> Louisville, KY
>>
>> PH 502-852-1809 (HSC)
>> PH 502-852-0467 (Belknap)
>> EM robert.flight at louisville.edu
>> EM rflight79 at gmail.com
>>
>> Williams and Holland's Law:
>>        If enough data is collected, anything may be proven by
>> statistical methods.
>>
>>
>>
>> On Thu, Apr 7, 2011 at 11:22, Assa Yeroslaviz <frymor at gmail.com> wrote:
>> > Hi,
>> >
>> > I am trying to run a HyerGTest with GOstats on a mouse genome entrez
>> > IDs.
>> >
>> > The Ids I have imported from biomart:
>> > entrez_data_1 <- getBM(attributes=c("mgi_id","entrezgene"), filters=
>> > "mgi_id", values = as.character(data_1$MGI),mart = mart)
>> > head(entrez_data_1)
>> > entrezID_Universe <-getBM(mart = mart, attributes = c("mgi_id",
>> > "entrezgene"), filters ="mgi_id", values =as.character(MaxQuant18$MGI))
>> > entrezID_Universe
>> > params <- new("GOHyperGParams", geneIds =
>> > as.character(entrez_data_1[,2]),
>> > universeGeneIds = as.character(entrezID_Universe[,2]), annotation =
>> > "org.Mm.eg.db", ontology = "BP", pvalueCutoff = 0.05, conditional =
>> > FALSE,
>> > testDirection = "over")
>> > I Than tried to run the HyperGTest command with success
>> > MmOverBP <- hyperGTest(paramsBP)
>> > MmOverBP
>> > Gene to GO BP  test for over-representation
>> > 3146 GO BP ids tested (118 have p < 0.05)
>> > Selected gene set size: 1006
>> >    Gene universe size: 2935
>> >    Annotation package: org.Mm.eg
>> > but than:
>> > summary(MmOverBP)
>> >> summary(MmOverBP)
>> > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) :
>> >  value for "GO:2000021" not found
>> >
>> > As far as I know, I have the latest version of both packages. I looked
>> > in
>> > AmiGO whether this GO Id exists: it does.
>> > AccessionGO:2000021OntologyBiological ProcessSynonymsrelated: regulation
>> > of
>> > electrolyte homeostasis related: regulation of negative regulation of
>> > crystal biosynthesisrelated: regulation of negative regulation of
>> > crystal
>> > formation Is there a way of putting/annotating this specific item
>> > manually,
>> > so that I can see it?
>> > If not-
>> > Is there a way of extracting this GO ID from the list of GO categories,
>> > so
>> > that I can see the results?
>> >
>> > Thanks a lot
>> > Assa
>> >
>> >
>> >> sessionInfo()
>> > R version 2.12.2 (2011-02-25)
>> > Platform: x86_64-pc-linux-gnu (64-bit)
>> >
>> > locale:
>> >  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>> >  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>> >  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>> >  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>> >
>> > attached base packages:
>> > [1] splines   grid      stats     graphics  grDevices utils     datasets
>> > [8] methods   base
>> >
>> > other attached packages:
>> >  [1] GO.db_2.4.1          org.Mm.eg.db_2.4.6   biomaRt_2.6.0
>> >  [4] Heatplus_1.20.0      gplots_2.8.0         caTools_1.11
>> >  [7] bitops_1.0-4.1       gdata_2.8.1          gtools_2.6.2
>> > [10] siggenes_1.24.0      multtest_2.7.1       Rgraphviz_1.29.0
>> > [13] xtable_1.5-6         annotate_1.28.1      GOstats_2.16.0
>> > [16] RSQLite_0.9-4        DBI_0.2-5            graph_1.28.0
>> > [19] Category_2.16.0      AnnotationDbi_1.12.0 Biobase_2.10.0
>> >
>> > loaded via a namespace (and not attached):
>> > [1] genefilter_1.32.0 GSEABase_1.12.1   MASS_7.3-11       RBGL_1.26.0
>> > [5] RCurl_1.5-0       survival_2.36-5   tcltk_2.12.2      tools_2.12.2
>> > [9] XML_3.2-0
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at r-project.org
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>
>



More information about the Bioconductor mailing list