[BioC] Biobase/reposTools Dependency

Colin A. Smith colin at colinsmith.org
Mon Jun 16 13:25:41 MEST 2003


I am looking into using Bioconductor as a slave batch processor for 
other programs. (BASE, other web tools, etc.) I've found that when 
using Bioconductor for the statistical analysis, a majority of the CPU 
time gets spent just loading it and not on the actual computation, 
which seems like a waste. This added overhead makes providing 
reasonably interactive output to the user difficult. While it might be 
possible to set up some sort of persistent R session that gets used, 
I'd rather reduce the complexity and just run R anew for each analysis.

Breaking the dependency of Biobase on reposTools strikes me as a 
particularly effective way to optimize the load sequence. They both 
take a long time to load relative to the loading of R itself. (This 
seems to stem mostly from the use of methods. Profiling shows that 60% 
of CPU time gets spent in setMethod while loading reposTools.)

While it's nice to be able to automagically download and install R 
libraries, the overwhelming majority of R sessions probably don't use 
this. (Especially if the R library directory isn't owned by the current 
user. The warning that pops up when this happens is another 
annoyance...) Is there some other showstopping reason for Biobase 
depending on reposTools?



More information about the Bioconductor mailing list