[BioC] BioPAX parsing

Oliver Ruebenacker curoli at gmail.com
Mon Jul 23 16:16:38 CEST 2012


     Hello,

  I created a prototype that can read RDF (using rJava and OpenRDF
Sesame) and turn it into a data frame.

  Then I discovered that Egon Willighagen did almost the same and
created RRDF (except that he used Jena instead of Sesame).

 RRDF does not provide a simple method to get a dataframe, but that
should not be hard to add. Less work probably than turning my
prototype into a deployable package.

     Take care
     Oliver

On Fri, Jul 20, 2012 at 3:02 PM, Paul Shannon <pshannon at fhcrc.org> wrote:
> Hi Oliver,
>
> Just checking in prior to the Bioconductor 2012 conference.   Have you had any luck with parsing BioPAX format as RDF triples?
>
> Thanks!
>
>  - Paul
>
> On Jun 16, 2012, at 3:10 AM, Oliver Ruebenacker wrote:
>
>>     Hello,
>>
>>  Thanks a lot for the endorsement!
>>
>>  I will try to create a prototype in the next days, and then you can
>> probably advice me on how to turn that into a package of desired
>> quality.
>>
>>     Take care
>>     Oliver
>>
>> On Fri, Jun 15, 2012 at 6:08 PM, Paul Shannon <pshannon at fhcrc.org> wrote:
>>> Oliver and Martin,
>>>
>>> It would be very helpful to have easy access to BioPAX data in Biocondcutor.
>>>
>>> Just now, at the weekly Bioconductor dev-team meeting, we discussed your ideas, and want to endorse them.  Oliver's proposal to parse the RDF triples into a data.frame has lots to recommend it.  It would be immediately useful, and yet also allow for more sophisticated uses later.  With these relationships in R, annotated as BioPAX data often are, we can imagine interested parties writing S4 classes which use the data, which might provide flexible querying capabilities, and be able to transform those triples into graphs and networks, for further computation and display.
>>>
>>> Please let us know if we can help.
>>>
>>> - Paul
>>>
>>>
>>> On Jun 15, 2012, at 12:23 PM, Oliver Ruebenacker wrote:
>>>
>>>>     Hello Martin,
>>>>
>>>>  I don't have code in R to test yet, but I do have extensive
>>>> experience handling BioPAX in Java, so I'm assuming reading BioPAX
>>>> using RJava should not be too difficult.
>>>>
>>>>  The best target format depends on what people would like to do with
>>>> the data. For visualization, a bi-partite graph in a popular
>>>> graph-layout package should be best. Is there any particular graph
>>>> package in BioConductor or R in general you would recommend?
>>>>
>>>>  For actual analysis, people probably have more specific requirements.
>>>>
>>>>  BioPAX is a format based on RDF/OWL, which in turn is based on
>>>> organizing data in triples, which could be stored in a three-column
>>>> data frame (or perhaps a fourth column for data type). For example
>>>> (incomplete, for illustration only):
>>>>
>>>>  ex:mapPhosphorylization   rdf:type   bp:BiochemicalReaction.
>>>>  ex:atp   rdf:type   bp:SmallMolecule.
>>>>  ex:adp   rdf:type   bp:SmallMolecule.
>>>>  ex:map   rdf:type   bp:Protein.
>>>>  ex:mapPhosphorylized   rdf:type   bp:Protein.
>>>>  ex:mapPhosphorylization   bp:left   ex:atp.
>>>>  ex:mapPhosphorylization   bp:left   ex:map.
>>>>  ex:mapPhosphorylization   bp:right   ex:adp.
>>>>  ex:mapPhosphorylization   bp:right   ex:mapPhosphorylized.
>>>>
>>>>     Take care
>>>>     Oliver
>>>>
>>>> On Fri, Jun 15, 2012 at 3:03 PM, Martin Preusse
>>>> <martin.preusse at googlemail.com> wrote:
>>>>> Hi Oliver,
>>>>>
>>>>> I think there is a lot interest in a bioconductor package!
>>>>>
>>>>> Personally, I would like to read pathways stored in the BioPAX format into any kind of graph. It's a philosophical question if reactions should have nodes or should sit on the edges :) So far I have not used any R graph package. But I assume there are some very generic packages which are flexible enough to support both direct and bi-partite pathway structure. I used e.g. the JUNG graph API for JAVA extensively.
>>>>>
>>>>> I'm not sure what you mean with RDF/OWL triples. For me BioPAX is only a format to store a pathway. And I would like to bring it back into its natural form: a network!
>>>>>
>>>>> Do you have any code to test? I have used RJava before. All this RDF and XML file format stuff kind of puzzles me though … :)
>>>>>
>>>>> Cheers
>>>>> Martin
>>>>>
>>>>>
>>>>>
>>>>> Am Freitag, 15. Juni 2012 um 18:32 schrieb Oliver Ruebenacker:
>>>>>
>>>>>> Hello Martin,
>>>>>>
>>>>>> I'm currently looking into reading BioPAX into R using RJava and
>>>>>> OpenRDF Sesame. If there is interest, I may be looking into submitting
>>>>>> a package to BioConductor.
>>>>>>
>>>>>> It would be very helpful if you could tell me what you need the
>>>>>> BioPAX data for, and in what form it would be best for you. Possible
>>>>>> options are:
>>>>>>
>>>>>> - A data frame of the RDF/OWL triples
>>>>>> - A graph of the RDF/OWL triples
>>>>>> - A data frame with one row for each reaction-participant
>>>>>> - A bi-partite graph with nodes for reactions and nodes for substances
>>>>>> - A with nodes for substances only, with edges for interactions
>>>>>> - A genetic interaction graph
>>>>>>
>>>>>> This list is roughly sorted form the one most easy to the most
>>>>>> difficult to provide.
>>>>>>
>>>>>> Take care
>>>>>> Oliver
>>>>>>
>>>>>> On Thu, Jun 14, 2012 at 10:01 AM, Martin Preusse
>>>>>> <martin.preusse at googlemail.com (mailto:martin.preusse at googlemail.com)> wrote:
>>>>>>> Many biological pathway resourced provide their data in the BioPAX format (http://www.biopax.org/index.php), a special XML format for biological interaction networks. Examples are pathway commons (http://www.pathwaycommons.org/pc/) and Reactome (http://www.reactome.org (http://www.reactome.org/)).
>>>>>>>
>>>>>>> A JAVA library for parsing BioPAX files exists: http://www.biopax.org/paxtools.php
>>>>>>>
>>>>>>> Has anybody used BioPAX files with R? Is it possible to read BioPAX files in any R based graph structure? A solution similar to the KEGGgraph package for KEGG pahways would be great, since more and more databases start using BioPAX.
>>>>>>>
>>>>>>>
>>>>>>> Any ideas are appreciated!
>>>>>>>
>>>>>>> Cheers
>>>>>>> Martin
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org)
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Oliver Ruebenacker
>>>>>> Bioinformatics Consultant (http://www.knowomics.com/wiki/Oliver_Ruebenacker)
>>>>>> Knowomics, The Bioinformatics Network (http://www.knowomics.com)
>>>>>> SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Oliver Ruebenacker
>>>> Bioinformatics Consultant (http://www.knowomics.com/wiki/Oliver_Ruebenacker)
>>>> Knowomics, The Bioinformatics Network (http://www.knowomics.com)
>>>> SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>>
>> --
>> Oliver Ruebenacker
>> Bioinformatics Consultant (http://www.knowomics.com/wiki/Oliver_Ruebenacker)
>> Knowomics, The Bioinformatics Network (http://www.knowomics.com)
>> SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)
>



-- 
Java Developer (Bioinformatics) at PanGenX (http://www.pangenx.com)
President and Founder of Knowomics
(http://www.knowomics.com/wiki/Oliver_Ruebenacker)
Consultant at Predictive Medicine
(http://predmed.com/people/oliverruebenacker.html)
SBPAX: Turning Bio Knowledge into Math Models (http://www.sbpax.org)



More information about the Bioconductor mailing list