[BioC] 'parallel' vs 'multicore'

Martin Morgan mtmorgan at fhcrc.org
Thu Oct 6 21:37:38 CEST 2011


Hi Tim

On 10/06/2011 12:26 PM, Tim Triche, Jr. wrote:
> Out of curiosity, why would memory be less of an issue with SNOW than
> with mclapply?

meant to leave the other impression -- that mclapply will generally be 
better with memory than snow.

> My intuition is that, as soon as the data in a child process' image
> diverges from the parent process', the memory usage will get pretty
> savage either way.  At least, that matches what I remember about how
> fork() works and what I see when I run diverging children.  They
> misbehave on occasion, as children are wont to do.

In principle and as I understand it fork should be copy-on-change. 
Objects shared between processes are not duplicated in memory until 
modified, so any data that is effectively read-only is handled better by 
multicore. Also, snow will serialize / unserialize objects to send them 
to children, and this can be quite slow for large objects; both snow and 
multicore rely on serialization for return values, which really 
encourages the idea that the return value is significantly reduced -- a 
vector of counts of reads overlapping regions of interest, rather than 
the reads themselves.

> Anyways -- would it be out of the question for 'parallel' to export a
> dummy function like
>
> mclapply <- lapply
>
> on Windows?  Maybe I'll go post that on r-dev so that Prof. Ripley can
> bite my head off :-)

yes that's your best bet!

Martin

> For all the shortcomings of foreach() / doMC() and friends, their
> default (run serially) was/is sensible.
>
>
>
> On Thu, Oct 6, 2011 at 12:09 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     On 10/06/2011 10:21 AM, Tim Triche, Jr. wrote:
>
>         I have a lot of methods in methylumi (the revised version) that
>         will happily
>         parallelize themselves for (e.g.) loading hundreds of IDAT
>         files, background
>         correcting and normalizing anything in sight, etc.  Sometimes
>         it's easier to
>         parallelize things until I can find time to make them properly
>         efficient
>         (boooo!).
>         When I compiled HEAD for R-2.14 the other day, after installing
>         it, I typed
>
>         library(parallel)
>
>         And all the handy bits of snow and multicore were in there!  If
>         I switch to
>         the 'parallel' package, by default, will I now be OK and not
>         screw Windows
>         users? Everything works great on Linux/Unix, and has done so for
>         months,
>         with 'multicore'.  It seems like there aren't any substantial
>         differences
>         other than things "just work" for a base installation -- do
>         other package
>         authors anticipate moving over now that this is slated to be in
>         the stable
>         release?
>
>
>     Yes you and other developers should switch to parallel; it seems to
>     be the wave of the future.
>
>     Likely your DESCRIPTION file should have
>
>       Imports: parallel
>
>     and your NAMESPACE
>
>       import(parallel)
>
>     Importing all of parallel seems to be the best solution, because the
>     available symbols depend on platform, e.g., mclapply on Linux / Mac
>     but not Windows.
>
>     It's still the case that mclapply, for instance, is not supported on
>     Windows so your code needs to have some conditional evaluation --
>     exists("mclapply", "package:parallel").
>
>     If memory weren't an issue, then the 'sockets' interface from SNOW
>     are the most portable.
>
>     Martin
>     --
>     Computational Biology
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
>     Location: M1-B861
>     Telephone: 206 667-2793 <tel:206%20667-2793>
>
>
>
>
> --
> If people do not believe that mathematics is simple,
> it is only because they do not realize how complicated life is.
>
>
>         John von Neumann
>         <http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Von_Neumann.html>
>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list