[BioC] "graphite" Biocarta 'native' graphs different from Biocarta web site?

Hamid Bolouri hbolouri at fhcrc.org
Wed Jun 6 02:55:39 CEST 2012


Graphite's native Biocarta pathways seem to have a different node list than that given by the Biocarta "PROTEIN LIST" link on Biocarta pathway pages (presumably what the pathway authors consider the 'true' pathway membership).

There seem to be 2 categories of difference: 
(1) Some genes listed by Biocarta are absent from graphite's version (see ??? marks in the example below).
(2) Because the native format nodes are annotated variously, it's necessary to do a node conversion. In particular, Biocarta's "PROTEIN LIST" gives _specific_ members of enzyme families, whereas graphite seems to replace EC numbers with all family members. However, I have trouble explaining how some enzymes are on/off the list (see --- marks in the example below).

Am I misinterpreting things? If not, is there any way to get pathway graphs with node lists more closely matching what Biocarta lists online?

Thanks,

Hamid Bolouri
-- 
http://labs.fhcrc.org/bolouri 

Example:

> biocarta[["epo signaling pathway"]]
"epo signaling pathway" pathway from BioCarta
Number of nodes     = 10
Number of edges     = 24
Type of identifiers = native
Retrieved on        = 2011-05-12
> nodes(biocarta[["epo signaling pathway"]])
 [1] "EntrezGene:2056"            "EntrezGene:2057"           
 [3] "EntrezGene:2885"            "EntrezGene:3265"           
 [5] "EntrezGene:6464"            "EntrezGene:6654"           
 [7] "EnzymeConsortium:2.7.1.112" "EnzymeConsortium:3.1.3.48" 
 [9] "EnzymeConsortium:3.1.4.11"  "STAT5"                     
> PE <- convertIdentifiers(biocarta[["epo signaling pathway"]],type="entrez")
> nodes(PE)
 [1] "2056"   "2057"   "2885"   "3265"   "6464"   "6654"   "52"     "993"   
 [9] "994"    "995"    "1843"   "1844"   "1845"   "1846"   "1847"   "1848"  
[17] "1849"   "1850"   "1852"   "5770"   "5777"   "5778"   "5781"   "5787"  
[25] "5788"   "5792"   "5795"   "5797"   "5798"   "5799"   "5801"   "5803"  
[33] "8555"   "8556"   "11072"  "11221"  "56940"  "80824"  "84867"  "5330"  
[41] "5331"   "5332"   "5333"   "5335"   "5336"   "23236"  "84812"  "113026"
> PS <- convertIdentifiers(biocarta[["epo signaling pathway"]],type="symbol")
> nodes(PS)
 [1] "EPO"    "EPOR"   "GRB2"   "HRAS"   "SHC1"   "SOS1"   "ACP1"   "CDC25A"
 [9] "CDC25B" "CDC25C" "DUSP1"  "DUSP2"  "DUSP3"  "DUSP4"  "DUSP5"  "DUSP6" 
[17] "DUSP7"  "DUSP8"  "DUSP9"  "PTPN1"  "PTPN6"  "PTPN7"  "PTPN11" "PTPRB" 
[25] "PTPRC"  "PTPRF"  "PTPRJ"  "PTPRM"  "PTPRN"  "PTPRN2" "PTPRR"  "PTPRZ1"
[33] "CDC14B" "CDC14A" "DUSP14" "DUSP10" "DUSP22" "DUSP16" "PTPN5"  "PLCB2" 
[41] "PLCB3"  "PLCB4"  "PLCD1"  "PLCG1"  "PLCG2"  "PLCB1"  "PLCD4"  "PLCD3" 

Compare the above with what I get from:
http://www.biocarta.com/pathfiles/PathwayProteinList.asp?showPFID=69
<NB The header is mine & I reordered the table to group similar cases>

<geneDescription	EntrezID	***==HBcomment>
erythropoietin 	2056 	***
erythropoietin receptor 	2057 	***
growth factor receptor-bound protein 2 	2885 	***
son of sevenless homolog 1 (Drosophila) 	6654 	***
v-Ha-ras Harvey rat sarcoma viral oncogene homolog 	3265 	***
signal transducer and activator of transcription 5A 	6776 	***
signal transducer and activator of transcription 5B 	6777 	***
SHC (Src homology 2 domain containing) transforming protein 1 	6464 	***
v-fos FBJ murine osteosarcoma viral oncogene homolog 	2353 	???
v-raf-1 murine leukemia viral oncogene homolog 1 	5894	???
ELK1, member of ETS oncogene family 	2002 	???
jun oncogene 	3725 	???
casein kinase 2, alpha 1 polypeptide 	1457 	???
Janus kinase 2 (a protein tyrosine kinase) 	3717 	???
mitogen-activated protein kinase 3 	5595 	---
mitogen-activated protein kinase 8 	5599 	---
mitogen-activated protein kinase kinase 1 	5604 	---
phospholipase C, gamma 1 	5335 	ok
protein tyrosine phosphatase, non-receptor type 6 	5777 	ok

HBcomment: ***== in graphite, ???==missing from graphite, 
---==specific enzymes in Biocarta are mapped to large (& urnrelated?) families in graphite

###
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] graphite_1.2.0       AnnotationDbi_1.18.1 Biobase_2.16.0      
[4] BiocGenerics_0.2.0   RSQLite_0.11.1       DBI_0.2-5           
[7] graph_1.34.0        

loaded via a namespace (and not attached):
[1] IRanges_1.14.3     org.Hs.eg.db_2.7.1 stats4_2.15.0      tools_2.15.0     
###



More information about the Bioconductor mailing list