[BioC] pooling for parallel hierarchical operations

Martin Morgan mtmorgan at fhcrc.org
Wed Nov 14 21:32:21 CET 2012


On 11/14/2012 6:40 AM, Michael Lawrence wrote:
> We often execute nested operations in parallel. For example, first by
> sample, then by chromosome. Fixed allocation of resources to each level
> will often result in waste. For example, if one sample finishes quickly,
> its CPUs are not available to help the other samples along. Perhaps the
> most expedient solution is to expand.grid() the hierarchy and create one
> job for every combination, i.e., flatten the hierarchy. A more ideal
> solution might be a pool of resources (cores) that are allocated more
> fluidly. Is there any sort of pooling system for R? I know that the
> parallel package supports the declaration of resources in cluster objects,
> but there is no central manager. This is a general R question, but it's
> worth discussing in the context of how we can make better use of
> parallelism in the low-level infrastructure, which would cause these
> hierarchies to arise. It's also relevant to the discussion of specifying
> parallelization modes or strategies. Pools themselves could be hierarchical
> and heterogeneous (hosts, cores). Declaring available resources is fairly
> straight-forward. Deciding how to use them is context dependent and
> requires user control.

Hi Michael -- Don't really have an answer for you but (a) sounds like you're 
looking for a scheduler, with the idea that the 'workers' have a deque of tasks 
that they are responsible for, but with some kind of collaboration between 
workers to balance tasks. I don't think the user should have (or have to) 
influence on the scheduler, it mostly just does the right thing. I think it 
would be good to develop scheduler(s) orthogonal to the parallel algorithm 
(lapply, pvec, map/reduce, etc).

I've started a BiocParallel package in Bioconductor's svn and on github

   https://github.com/Bioconductor/BiocParallel

so that might provide a place to focus this development; I'd encourage use of 
github and it's social coding as the primary means for development at this time.

Martin

>
> Michael
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Dr. Martin Morgan, PhD
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109



More information about the Bioconductor mailing list