[BioC] rBiopaxParser, Reactome and namespaces

Wed May 22 05:08:12 CEST 2013

Hi Frank,

I am most happiliy using the rBiopaxParser package, and your vignette, in order to extract detailed (but topologically simple) interaction data from the latest Reactome "Homosapiens.owl".  Your package offers great power and convenience.

However, I run into difficulty with namespaces.  

For a simple example, consider this one line from the method listIntances, found in the file R/selectBiopax.R:

   sel = sel & (tolower(biopax$df$class) %in% tolower(stripns(class)))

As parsed from Homosapiens.owl, the class column of biopax$df has values like these, always containing a namespace prefix:

   head(unique(biopax$df$class))
     "bp:BiochemicalReaction"        "bp:Protein"                    
     "bp:CellularLocationVocabulary" "bp:UnificationXref"           
     "bp:ProteinReference"           "bp:BioSource"                 

By stripping the namespace off of "bp:Protein" (the right hand side of the %in% clause) it cannot match the biopax$df$class value, as parsed from the owl file (which preserves the "bp:").

I believe I see similar logic in other places, with these methods specifically encountered so far:

  selectInstances
  listPathwayComponents

Namespaces  are used with the "property" column as well: 

   head(table(biopax$df$property), n=3)
      bp:author bp:cellularLocation          bp:comment 
          55654               23838              123750 

Speaking from the nickel seats, and not claiming to understand all of the implications:  perhaps these could be neatly avoided if your readBiopax method could optionally eliminate namespaces when reading in an owl file?  

Thanks,

 - Paul