[BioC] multicore and GRangesList [Resurrected]

Cook, Malcolm MEC at stowers.org
Thu Sep 20 19:06:50 CEST 2012


Hi Martin,

The benefits of the functional stuff are purely stylistic.

And NOT (I have just learned) performance!

Indeed, after running some timing tests, I have rewritten pvec_along without using Compose & Curry, as:

pvec_along <-function(x,FUN,...) {
### PURPOSE: extension to parallel::pvec for non-vectors which is
### vectorized over the indices of x.
###
### Example: pvec_along(myGRangesList,width)
###          this is functionally equivalent to:
###          pvec(seq_along(myGRangesList),function(i) width(myGRangesList[i]))
###
### Requires: `library(parallel)`
  indices<-seq_along(x)
  FUN<-match.fun(FUN)
  ## FYI: repeated system.times using 11 cores showed 13% worse
  ## performance using `library(functional)` approach written as:
  ## pvec(indices,Compose(Curry(`[`,x),FUN),...)
  pvec(indices,function(indices) FUN(x[indices]),...)
}

Better?

So, my stylistic preferences are admonished.  I have been increasingly developing idiomatic use of Compose and Curry.  Perhaps I must stop.  Or learn if possible to avoid the overhead they impose.

Regardless....

In any case, pvec_along is just a simple convenience wrapper to something that could be directly written.  But I find it a very useful abstraction.

Do you see better ways of expressing this idiom?

It is arguable that mclapply (and pvec) should 'just work' over GRangesList.  After all, lapply does.

But, to remind us:

> parallel::mclapply(myGRangesList,width)
Error in as.list.default(X) : 
  no method for coercing this S4 class to a vector

and, of course, pvec only works with vectors:

> pvec(myGRangesList,width)
Error in pvec(myGRangesList, width) : 'v' must be a vector

Do you think mclapply/pvec should work with Lists?  

FWIW: one aspect of pvec that I think could be improved is how the results from each core are combined, which is hard-wired to `c` where it could be made an optional parameter (i.e. `GRangesList`).

In the mean time, FWIW, I have written a similar wrapper to mclapply named mclapply_alongRanges.

~Malcolm


> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, September 20, 2012 8:11 AM
> To: Cook, Malcolm
> Cc: 'Bioconductor Newsgroup (bioconductor at stat.math.ethz.ch)'; 'arne.mueller at novartis.com'; 'stefano.calza at med.unibs.it';
> 'barr.cory at gene.com'; 'Steve Lianoglou (mailinglist.honeypot at gmail.com)'; 'Michael Lawrence <lawrence.michael at gene.com>
> (lawrence.michael at gene.com)'; Blanchette, Marco
> Subject: Re: multicore and GRangesList [Resurrected]
> 
> On 09/19/2012 09:30 AM, Cook, Malcolm wrote:
> > The question of approaches to parallelizing operations on a GRangesList was raised in this thread:
> http://thread.gmane.org/gmane.science.biology.informatics.conductor/32799
> >
> > I find the issue still relevant when using the new `parallel` package.
> >
> > I have adopted the following practice, for which I seek your criticism or accolades.  Your choice.
> >
> > The approach is to use parallel::pvec over the indices of the GRangesList, with a little sugar in the form of...
> >
> > pvec_along <-function(x,FUN,...) {
> > ### PURPOSE: extension to parallel::pvec for non-vectors which is
> > ### vectorized over the indices of x.
> > ###
> > ### Example: pvec_along(myGRangesList,width)
> > ###
> > ### Requires: `library(functional)` `library(parallel)`
> >    indices<-seq_along(x)
> >    FUN<-match.fun(FUN)
> >    pvec(indices,Compose(Curry(`[`,x),FUN),...)
> > }
> >
> > Discuss?
> 
> pvec seems conceptually relevant; the benefits of the functional stuff
> not immediately clear. Explain.
> 
> >
> > Best,
> >
> > ~ Malcolm Cook
> >
> 
> 
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> 
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793



More information about the Bioconductor mailing list