[BioC] Retrieving all entrez identifiers that are annotated in KEGG pathways

James W. MacDonald jmacdon at uw.edu
Sat Mar 2 19:41:19 CET 2013


Hi Anirban,

 > library(hgu133plus2.db)
 > x <- select(hgu133plus2.db, Lkeys(hgu133plus2PATH), c("ENTREZID","PATH"))
Warning message:
In .generateExtraRows(tab, keys, jointype) :
   'select' resulted in 1:many mapping between keys and return rows
 > head(x)
     PROBEID ENTREZID  PATH
1 1007_s_at      780 <NA>
2   1053_at     5982 03030
3   1053_at     5982 03420

 > egids <- unique(x$ENTREZID[!is.na(x$PATH)])
 > length(egids)
[1] 5498


Best,

Jim




On 3/2/2013 8:20 AM, Anirban [guest] wrote:
> Dear all,
>
> Is there any way to get all entrez identifiers that are annotated with KEGG pathways? Actually I am using GOStats package in R to perform KEGG pathway enrichment analysis.. In general, for each KEGG pathway term there is a list of annotated hgnc symbols or entrez identifiers.. For all KEGG pathway terms we must have one list of entrez identifiers. I want to have that list...
>
> What I am doing write now is as follows:
> library(biomaRt)
> library("GO.db")
> library("KEGG.db")
> library("GOstats")
> library("hgu133plus2.db")
> library("EMA")
> library("fdrtool")
> library("org.Hs.eg.db")
> ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
> x<- hgu133plus2PATH
> mapped_probes<- mappedkeys(x)
>
> b<-getBM(attributes=c("hgnc_symbol"),filters="affy_hg_u133_plus_2",values=mapped_probes,mart=ensembl)
>
> Is it the correct way to do that?
>
> Thanks in advance.. :)
>
>   -- output of sessionInfo():
>
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list