[BioC] pathview puzzle

Luo Weijun luo_weijun at yahoo.com
Wed Aug 28 20:44:11 CEST 2013


Hi Oleg,
I just update pathview package so it can process and analyze data labeled with KEGG gene IDs other than Entrez Gene. It turns out that this issue affects many other species too. So with this update, you can literaully work with all ~2300 (and more forth-coming) KEGG species data with pathview now. I’ve also added new content with working examples on KEGG species and Gene ID usage in page 14-16 of the vignette. Notice that you need to specified gene.idtype="KEGG" when calling pathview.
I’ve posted the new package to R-forge. You should be able to access it in the next few hours at http://r-forge.r-project.org/R/?group_id=1619. Just install it follow the instruction there. The Bioc version will also be updated in the next 1-2 days: http://bioconductor.org/packages/devel/bioc/html/pathview.html.
Let me know how that works or if you have questions. HTH.
Weijun

--------------------------------------------


 Subject: Re: [BioC] pathview puzzle
 To: Bioconductor at r-project.org, "Oleg Moskvin" <moskvin at wisc.edu>
 Date: Friday, August 23, 2013, 9:53 PM

 Hi Oleg,
 Thanks for the note. This is indeed a problem I didn’t
 realize previously! KEGG uses Entrez Gene ID for all other
 model organisms I’ve checked.
 I am working on a generic fix (not only for E coli but other
 species with similar situation) and will incorporate that
 into the development version of pathview soon. Will keep you
 posted.
 Thanks for pointing this out.
 Weijun


 --------------------------------------------
 On Fri, 8/23/13, Oleg Moskvin <moskvin at wisc.edu>
 wrote:

  Subject: Re: [BioC] pathview puzzle
  To: Bioconductor at r-project.org,

  Date: Friday, August 23, 2013, 12:19 PM
  
  Hi Weijun,
  
  Thank you for the response. 
  
  The problem seems to be deeper than that and is connected
 to
  special handling of a particular species - E.coli - by
 KEGG.
  
  
  I looked into the pathview() code and here is what I see: 
  
  1) gene.data is remapped internally via mol.sum() to have
  ENTREZ IDs;
  2) remapped gene.data is used by node.map() to map onto
 KEGG
  nodes using node.data
  3) the node.data used in (2) was originally extracted from
  the KEGG XML by node.info()
  
  The above route implies that the "name" entries in the
 KEGG
  XML of type="gene" have "speciesID:ENTREZ" format...
  
  And in the case of E.coli this doesn't hold true! See the
  examples of XML entries for H.sapience and E.coli from my
  yesterday's message (below). 
  
  In fact, in KEGG XML for E.coli "gene" records b-numbers
 are
  used as IDs! 
  
  So, for the cases like that, when KEGG fails to be
  consistent in the supplied XML structure, one may suggest
  introducing an "id.bypass" option to pathview() which will
  take the gene.data as is (with the IDs supplied by user
 that
  match KEGG XML ids; for example, b-numbers), and pass this
  directly to the step 3 (node matching).
  
  Thanks!
  
  Oleg
  
  
  
  On 08/22/13, Luo Weijun wrote:
  > Hi Oleg,
  > You are right, the problem is due to ID type
  inconsistency.
  > You have to specify gene.idtype when calling pathview
  function, if your gene id type is not Entrez Gene. I
 don’t
  think b-numbers are recognized for sure. For your gene
 name
  example, if you mean official gene symbols by “gene
  name”, you should specify gene.idtype="SYMBOL" (lower
 case
  is fine):
  > eco2.out <- pathview(gene.data =
  T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id = "02010",
  gene.idtype="SYMBOL", out.suffix = "T2ACSH", species =
  "eco", kegg.native=TRUE)
  
  
  On 08/22/13, Oleg Moskvin  wrote:
  
  > 
  > <entry id="2" name="hsa:51343" type="gene"
  > link="http://www.kegg.jp/dbget-bin/www_bget?hsa:51343">
  > <graphics name="FZR1, CDC20C, CDH1, FZR, FZR2,
 HCDH,
  HCDH1" fgcolor="#000000" bgcolor="#BFFFBF"
  > type="rectangle" x="919" y="536" width="46"
  height="17"/>
  > </entry>
  > 
  > 
  > <entry id="4" name="eco:b1513" type="gene"
  > link="http://www.kegg.jp/dbget-bin/www_bget?eco:b1513">
  > <graphics name="lsrA" fgcolor="#000000"
  bgcolor="#BFFFBF"
  > type="rectangle" x="339" y="1882" width="46"
  height="17"/>
  > </entry>



More information about the Bioconductor mailing list