[BioC] slow graph methods

Martin Morgan mtmorgan at fhcrc.org
Wed Sep 26 23:16:17 CEST 2007


Hi Paul --

I think Seth's speed-up came when he realized that the entire graph
gets copied when a single node or edge is added. So his solution was to
accumulate a bunch of changes (actually, the whole XML file) and then
apply them in a single call to, e.g., addEdge using the fact that the
arguments from, to can be vectors. Seth accumulated nodes / edges to
be added in an environment, which does not get copied (the way a list
would) each time it is changed. Maybe that perspective helps you to
revise your code to add nodes in batches, rather than individually?

Martin

Paul Shannon <pshannon at systemsbiology.org> writes:

> A couple of weeks ago, just before he left, Seth made a striking (800x)
> improvement to the speed of graph::fromGXL.  This has been a big help.
>
> However, I'd like to construct graphs incrementally, and  
> programmatically,
> without having to create new gxl files.  Is it possible that
> Seth's efficiencies have not yet been applied to other methods in the  
> graph class?
>
> In particular, it seems that the following operations take a longish  
> time
> on graphs of a few thousand nodes:
>
>    edgeData
>    nodeData
>    addEdge
>    addNode
>
> I could provide actual times and test cases if that would help.
>
> Thanks!
>
>   - Paul
>
>
>
> On Sep 12, 2007, at 8:09 AM, Seth Falcon wrote:
>
>> Hi again,
>>
>> Seth Falcon <sfalcon at FHCRC.ORG> writes:
>>> Paul Shannon <pshannon at systemsbiology.org> writes:
>>>> It took nearly 24 hours (!) to create a 16k node graph using two
>>>> different techniques:
>>>>
>>>>     g = fromGXL (file ('someFile.gxl'))
>>
>> Using a patched version of fromGXL (on my laptop) I get:
>>
>>> library(graph)
>>> con = file("kegg.yeast.gxl", open="r")
>>> system.time(z <- fromGXL(con))
>>        user  system elapsed
>>     104.366   0.570 105.070
>>> z
>>     A graphNEL graph with undirected edges
>>     Number of Nodes = 15158
>>     Number of Edges = 32668
>>> validObject(z)
>>     [1] TRUE
>>
>> That's over 800x faster :-)
>>
>> I've checked in the changes in devel as graph 1.15.17.  You can get it
>> from svn or wait till this time tomorrow (in either case, you will
>> need R 2.6.0 alpha).
>>
>> The code passes the existing unit tests, but some extra scrutiny that
>> the returned graphs are as desired is quite welcome.
>>
>> + seth
>>
>> -- 
>> Seth Falcon | Computational Biology | Fred Hutchinson Cancer  
>> Research Center
>> BioC: http://bioconductor.org/
>> Blog: http://userprimary.net/user/
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the Bioconductor mailing list