[BioC] pathview puzzle

Luo Weijun luo_weijun at yahoo.com
Fri Aug 30 16:26:59 CEST 2013


The updated pathview (version 1.1.5) is now available through BioC devel version:
http://bioconductor.org/packages/2.13/bioc/html/pathview.html
R-forge version failed to build because they haven’t installed some dependency package with their new R 3.0.1. I’ve contact their admin, but not sure when this can be solved.
Weijun

--------------------------------------------


 Subject: Re: [BioC] pathview puzzle
 To: "Oleg Moskvin" <moskvin at wisc.edu>
 Cc: Bioconductor at r-project.org
 Date: Wednesday, August 28, 2013, 2:44 PM

 Hi Oleg,
 I just update pathview package so it can process and analyze
 data labeled with KEGG gene IDs other than Entrez Gene. It
 turns out that this issue affects many other species too. So
 with this update, you can literaully work with all ~2300
 (and more forth-coming) KEGG species data with pathview now.
 I’ve also added new content with working examples on KEGG
 species and Gene ID usage in page 14-16 of the vignette.
 Notice that you need to specified gene.idtype="KEGG" when
 calling pathview.
 I’ve posted the new package to R-forge. You should be able
 to access it in the next few hours at http://r-forge.r-project.org/R/?group_id=1619. Just
 install it follow the instruction there. The Bioc version
 will also be updated in the next 1-2 days: http://bioconductor.org/packages/devel/bioc/html/pathview.html.
 Let me know how that works or if you have questions. HTH.
 Weijun

 --------------------------------------------

 wrote:

  Subject: Re: [BioC] pathview puzzle
  To: Bioconductor at r-project.org,
 "Oleg Moskvin" <moskvin at wisc.edu>
  Date: Friday, August 23, 2013, 9:53 PM
  
  Hi Oleg,
  Thanks for the note. This is indeed a problem I didn’t
  realize previously! KEGG uses Entrez Gene ID for all other
  model organisms I’ve checked.
  I am working on a generic fix (not only for E coli but
 other
  species with similar situation) and will incorporate that
  into the development version of pathview soon. Will keep
 you
  posted.
  Thanks for pointing this out.
  Weijun
  
  
  --------------------------------------------
  On Fri, 8/23/13, Oleg Moskvin <moskvin at wisc.edu>
  wrote:
  
   Subject: Re: [BioC] pathview puzzle
   To: Bioconductor at r-project.org,

   Date: Friday, August 23, 2013, 12:19 PM
   
   Hi Weijun,
   
   Thank you for the response. 
   
   The problem seems to be deeper than that and is
 connected
  to
   special handling of a particular species - E.coli -
 by
  KEGG.
   
   
   I looked into the pathview() code and here is what I
 see: 
   
   1) gene.data is remapped internally via mol.sum() to
 have
   ENTREZ IDs;
   2) remapped gene.data is used by node.map() to map
 onto
  KEGG
   nodes using node.data
   3) the node.data used in (2) was originally extracted
 from
   the KEGG XML by node.info()
   
   The above route implies that the "name" entries in
 the
  KEGG
   XML of type="gene" have "speciesID:ENTREZ" format...
   
   And in the case of E.coli this doesn't hold true! See
 the
   examples of XML entries for H.sapience and E.coli
 from my
   yesterday's message (below). 
   
   In fact, in KEGG XML for E.coli "gene" records
 b-numbers
  are
   used as IDs! 
   
   So, for the cases like that, when KEGG fails to be
   consistent in the supplied XML structure, one may
 suggest
   introducing an "id.bypass" option to pathview() which
 will
   take the gene.data as is (with the IDs supplied by
 user
  that
   match KEGG XML ids; for example, b-numbers), and pass
 this
   directly to the step 3 (node matching).
   
   Thanks!
   
   Oleg
   
   
   
   On 08/22/13, Luo Weijun wrote:
   > Hi Oleg,
   > You are right, the problem is due to ID type
   inconsistency.
   > You have to specify gene.idtype when calling
 pathview
   function, if your gene id type is not Entrez Gene. I
  don’t
   think b-numbers are recognized for sure. For your
 gene
  name
   example, if you mean official gene symbols by
 “gene
   name”, you should specify gene.idtype="SYMBOL"
 (lower
  case
   is fine):
   > eco2.out <- pathview(gene.data =
   T2.CEBF095.crt115.ASCH.DROP3.rel.gn, pathway.id =
 "02010",
   gene.idtype="SYMBOL", out.suffix = "T2ACSH", species
 =
   "eco", kegg.native=TRUE)
   
   
   On 08/22/13, Oleg Moskvin  wrote:
   
   > 
   > <entry id="2" name="hsa:51343" type="gene"
   > link="http://www.kegg.jp/dbget-bin/www_bget?hsa:51343">
   > <graphics name="FZR1, CDC20C, CDH1, FZR,
 FZR2,
  HCDH,
   HCDH1" fgcolor="#000000" bgcolor="#BFFFBF"
   > type="rectangle" x="919" y="536" width="46"
   height="17"/>
   > </entry>
   > 
   > 
   > <entry id="4" name="eco:b1513" type="gene"
   > link="http://www.kegg.jp/dbget-bin/www_bget?eco:b1513">
   > <graphics name="lsrA" fgcolor="#000000"
   bgcolor="#BFFFBF"
   > type="rectangle" x="339" y="1882" width="46"
   height="17"/>
   > </entry>
  



More information about the Bioconductor mailing list