[BioC] KEGG overrepresentation loses genes

Marc Carlson mcarlson at fhcrc.org
Thu Apr 15 00:00:03 CEST 2010


Hi Anne,

Unfortunately, this is a small bug you have uncovered that can affect
people using Category to do KEGG analysis.  I have just fixed it, and a
patched version should be available within a day or so via biocLite().

Good job on finding that, and thanks for sharing.


  Marc



On 04/14/2010 08:04 AM, Anne Kupczok wrote:
> Hello,
> I observed the following problem when using the KEGG annotation with
> hyperGTest: Somehow hyperGTest does not consider all genes. In the
> example below, all three genes are in the category "05020" (this is
> what mget(genes,envir=org.Hs.egPATH) says). In the summary of
> hyperGTest, however, the category contains only two genes.
> Is there an explanation of this behavior?
> Thanks in advance!
> Anne
>
> > library("Category")
> Loading required package: AnnotationDbi
> Loading required package: Biobase
>
> Welcome to Bioconductor
>
>  Vignettes contain introductory material. To view, type
>  'openVignette()'. To cite Bioconductor, see
>  'citation("Biobase")' and for packages 'citation(pkgname)'.
>
> > library("org.Hs.eg.db")
> Loading required package: DBI
> > genes=c("1958","3553","3303")
> >
> >
> GoHyp=new("KEGGHyperGParams",geneIds=genes,annotation="org.Hs.eg",pvalueCutoff=1,testDirection="over")
>
> > htest=hyperGTest(GoHyp)
> > s=summary(htest)
> > s[1,]
>  KEGGID       Pvalue OddsRatio    ExpCount Count Size           Term
> 1  05020 3.810228e-06       Inf 0.003960844     2   35 Prion diseases
> >
> > p=mget(genes,envir=org.Hs.egPATH,ifnotfound=NA)
> > p
> $`1958`
> [1] "05020"
>
> $`3553`
> [1] "04010" "04060" "04210" "04620" "04640" "04940" "05010" "05020"
> "05332"
>
> $`3303`
> [1] "04010" "04144" "04612" "05020"
>
> > geneIdsByCategory(htest,"05020")
> $`05020`
> [1] "1958" "3553"
>
> > sessionInfo()
> R version 2.10.0 (2009-10-26)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] KEGG.db_2.3.5       org.Hs.eg.db_2.3.6  RSQLite_0.7-3
> [4] DBI_0.2-4           Category_2.12.0     AnnotationDbi_1.8.1
> [7] Biobase_2.6.0
>
> loaded via a namespace (and not attached):
> [1] annotate_1.24.0   genefilter_1.28.0 graph_1.24.1      GSEABase_1.8.0
> [5] RBGL_1.22.0       splines_2.10.0    survival_2.35-7   tools_2.10.0
> [9] XML_2.6-0         xtable_1.5-6
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list