[BioC] Why are there different number of pathways in pathway2gene and in pathway2name (KEGG.db)?

Peng Yu pengyu.ut at gmail.com
Wed Oct 12 23:39:02 CEST 2011


Hi,

There are 292 pathways according to pathway2gene, but there are 390
pathways according to pathway2name. I'm wondering why these two
numbers are not the same.

> library(KEGG.db)
> pathway2gene=dbGetQuery(KEGG_dbconn(), "SELECT * FROM pathway2gene")
>
> species=unique(substr(unique(pathway2gene$pathway_id),1,3))
> species
 [1] "hsa" "ath" "dme" "mmu" "rno" "sce" "pfa" "dre" "eco" "ecs" "cfa" "bta"
[13] "cel" "ssc" "gga" "mcc" "xla" "aga" "ptr"
>
> tmp=lapply(
+   species
+   , function(x) {
+     unique(pathway2gene$pathway_id[grep(paste('^', x,sep=''),
pathway2gene$pathway_id)])
+   }
+   )
>
> sapply(tmp, length)
 [1] 229 123 127 225 225  99  80 155 105 107 224 225 125 225 152 225 150 126 225
>
> tmp1=unique(
+   unlist(
+     lapply(
+       tmp
+       , function(x) {
+         substr(x, 4, 8)
+       }
+       )
+     )
+   )
>
> length(tmp1)
[1] 292


> pathway2name=dbGetQuery(KEGG_dbconn(), 'SELECT * FROM pathway2name')
> length(unique(pathway2name$path_id))
[1] 390

> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] smart.source_1.0     KEGG.db_2.5.0        RSQLite_0.9-4
[4] DBI_0.2-5            AnnotationDbi_1.14.1 Biobase_2.12.2

-- 
Regards,
Peng



More information about the Bioconductor mailing list