[BioC] Why are there different number of pathways in pathway2gene and in pathway2name (KEGG.db)?

Marc Carlson mcarlson at fhcrc.org
Tue Nov 1 01:13:07 CET 2011


Hi Peng,

It's because of the way that the database was built.  The data in 
pathway to gene is limited to those organisms that we produce annotation 
packages for here.

   Marc


On 10/12/2011 02:39 PM, Peng Yu wrote:
> Hi,
>
> There are 292 pathways according to pathway2gene, but there are 390
> pathways according to pathway2name. I'm wondering why these two
> numbers are not the same.
>
>> library(KEGG.db)
>> pathway2gene=dbGetQuery(KEGG_dbconn(), "SELECT * FROM pathway2gene")
>>
>> species=unique(substr(unique(pathway2gene$pathway_id),1,3))
>> species
>   [1] "hsa" "ath" "dme" "mmu" "rno" "sce" "pfa" "dre" "eco" "ecs" "cfa" "bta"
> [13] "cel" "ssc" "gga" "mcc" "xla" "aga" "ptr"
>> tmp=lapply(
> +   species
> +   , function(x) {
> +     unique(pathway2gene$pathway_id[grep(paste('^', x,sep=''),
> pathway2gene$pathway_id)])
> +   }
> +   )
>> sapply(tmp, length)
>   [1] 229 123 127 225 225  99  80 155 105 107 224 225 125 225 152 225 150 126 225
>> tmp1=unique(
> +   unlist(
> +     lapply(
> +       tmp
> +       , function(x) {
> +         substr(x, 4, 8)
> +       }
> +       )
> +     )
> +   )
>> length(tmp1)
> [1] 292
>
>
>> pathway2name=dbGetQuery(KEGG_dbconn(), 'SELECT * FROM pathway2name')
>> length(unique(pathway2name$path_id))
> [1] 390
>
>> sessionInfo()
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
> [1] smart.source_1.0     KEGG.db_2.5.0        RSQLite_0.9-4
> [4] DBI_0.2-5            AnnotationDbi_1.14.1 Biobase_2.12.2
>



More information about the Bioconductor mailing list