[BioC] pathview puzzle

Luo Weijun luo_weijun at yahoo.com
Thu Aug 22 23:01:49 CEST 2013


Hi Oleg,
You are right, the problem is due to ID type inconsistency.
You have to specify gene.idtype when calling pathview function, if your gene id type is not Entrez Gene. I don’t think b-numbers are recognized for sure. For your gene name example, if you mean official gene symbols by “gene name”, you should specify gene.idtype="SYMBOL" (lower case is fine):
eco2.out <- pathview(gene.data = T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010", gene.idtype="SYMBOL", out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE)

You may want to check the help info on pathview function for details:
?pathview

Pathview supports 10 different common ID types for a model organisms (plus KEGG orthology IDs). For the supported common ID types, type:
gene.idtype.list

For external IDs not in the supported common ID type lists, we may make use of the mol.sum function to do the ID and data mapping explicitly. Check the example in page 14 of the vignette or help info on the function:
?mol.sum

HTH.
Weijun


--------------------------------------------
On Wed, 8/21/13, Oleg Moskvin <moskvin at wisc.edu> wrote:

 Subject: pathview: problem with coloring

 Date: Wednesday, August 21, 2013, 6:12 PM

 Hi Weijun,

 Your pathview is very attractive package. While I can
 reproduce the results with the human data provided in the
 example, I am getting coloring problems with E.coli data. 

 This seems to be gene ID mismatch that comes from the
 inconsistency in the ID handling by the package. 

 The KEGG pathways fro E.coli contains "b-numbers" as gene
 IDs. 

 If I supply expression set based on b-numbers, it is not
 recognized, if I supply expression set based on gene names,
 it is (!) recognized but the resulting coloring is all-white
 (#FFFFFF). 

 Details:

 ###### 1. Using b-numbers:
 head(T2.CEBF095.crt115.ASCH.DROP3.rel)
  ACSH_vs_synH
 EKO11_2926 -1.3362079
 b0019 0.9265879
 b0032 -4.2007218
 b0033 -3.6678436
 b0058 1.1996750
 b0060 0.8624787

 eco.out <- pathview(gene.data =
 T2.CEBF095.crt115.ASCH.DROP3.rel, pathway.id = "02010",
 out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE)
 [1] "Downloading xml files for eco02010, 1/1 pathways.."
 [1] "Downloading png files for eco02010, 1/1 pathways.."

 Error in mol.data[as.character(items[hit]), ] : subscript
 out of bounds
 In addition: Warning messages:
 1: In node.map(gene.data, node.data, node.types =
 gene.node.type, node.sum = node.sum) :
  NAs introduced by coercion
 2: In FUN(1:153[[1L]], ...) : NAs introduced by coercion


 ###### 2. Using gene names:
 head(T2.CEBF095.crt115.ASCH.DROP3.rel.gn)
  ACSH_vs_synH
 nhaA 0.9265879
 carA -4.2007218
 carB -3.6678436
 caiF -1.4380677
 folA -0.8914105
 rluA 1.1996750

 eco2.out <- pathview(gene.data =
 T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010",
 out.suffix = "T2ACSH", species = "eco", kegg.native=TRUE)

 Loading required package: org.EcK12.eg.db

 Working in directory
 /mnt/omdir/omoskvin/Projects/Ecoli/cMonkey
 Writing image file eco02010.T2ACSH.png
 There were 50 or more warnings (use warnings() to see the
 first 50)

 > head(eco2.out[[1]])
  kegg.names labels type x y width height ACSH_vs_synH
 mol.col
 4 b1513 gene 339 1882 46 17 NA #FFFFFF
 5 b1515 gene 293 1890 46 17 NA #FFFFFF
 6 b1514 gene 293 1873 46 17 NA #FFFFFF
 7 b1516 gene 247 1882 46 17 NA #FFFFFF
 18 b4087 gene 339 1823 46 17 NA #FFFFFF
 19 b4086 gene 293 1823 46 17 NA #FFFFFF


 So, b-numbers cause an early "out of bounds" error while
 gene names result in proceeding further but no coloring in
 the result!

 Please help.

 Thank you, 

 Oleg



More information about the Bioconductor mailing list