[BioC] multicore Vignette or HowTo??

Edwin Groot edwin.groot at biologie.uni-freiburg.de
Tue Oct 19 14:49:30 CEST 2010


On Mon, 18 Oct 2010 09:39:22 -0700
 Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 10/18/2010 09:05 AM, Edwin Groot wrote:
> > Hello all,
> > I have difficulty getting the multicore package doing what it
> promises.
> > Does anybody have a benchmark that demonstrates something intensive
> > with and without multicore assistance?
> > I have a dual dual-core Xeon, and $ top tells me all R can squeeze
> from
> > my Linux system is 25% us. Here is my example:
> > 
<snip>
> >> pnorm <- mcparallel(normalize.Probes(array, method = "loess"))
> 
> Here's my favorite test of parallel functionality
> 
> > library(multicore)
> > system.time(lapply(1:4, function(i) Sys.sleep(1)))
>    user  system elapsed
>   0.001   0.000   4.004
> > system.time(mclapply(1:4, function(i) Sys.sleep(1)))
>    user  system elapsed
>   0.007   0.005   1.009
> 
> time goes 4x faster!

Hmm, a great parlour trick!

> 
> Code has to be multicore-aware, and saying something like
> 
>     pnorm <- mcparallel(normalize.Probes(array, method = "loess"))
>     array_norm <- collect(pnorm)
> 
> just says to fork a process to do the task, not to do the task in
> parallel (multicore doesn't do anything clever, like identify parts

Ahah, I am ignorantly using this multicore package. It shows how little
I know about what happens under-the-hood with the software. I asked
this clueless question in the first place because I need some real data
and code that demonstrated the principle of parallel computation.
What I gave as an example was trivial, as it is a single process,
right?
If I get this right, I have to find a way to split my data into (up to
4 in my case) parts and have mcparallel() distribute their load?
Hmm, but that would not work for normalization, because all the
information from the data set is needed. Now what?

> of
> the code that could be parallelized). The Starr author would have to
> implement normalize.Probes to take advantage of multiple cores, or
> your
> own task would have to be parallelizable at the 'user' level, like an
> lapply.
> 
> I'm really not sure why array_norm is NULL. after looking at the
> example
> on ?normalize.Probes I did
> 

I think I entered array_norm <- collect(pnorm) twice, which probably
throws out the contents from the first collect() call.

<snip>

> Martin
> 
> >> Normalizing probes with method: loess
> > Done with 1 vs 2 in iteration 1 
> > #Function continues for some time and displays more messages. No
> > benefit from multicore. $ top reports 25% us during the run...
> >> array_norm <- collect(pnorm)
> > #Oh dear, where did my normalized data go?
> >> array_norm
> > $`4037`
> > NULL
> >> sessionInfo()
> > R version 2.11.1 (2010-05-31) 
> > x86_64-pc-linux-gnu 
> > 
> > locale:
> >  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
> >  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
> >  [5] LC_MONETARY=C              LC_MESSAGES=en_GB.UTF-8   
> >  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
> >  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
> > 
> > attached base packages:
> > [1] tools     grid      stats     graphics  grDevices utils
> >     datasets 
> > [8] methods   base     
> > 
> > other attached packages:
> >  [1] geneplotter_1.26.0   annotate_1.26.1      AnnotationDbi_1.10.2
> >  [4] Starr_1.4.4          affxparser_1.20.0    affy_1.26.1         
> >  [7] Ringo_1.12.0         Matrix_0.999375-39   lattice_0.18-8      
> > [10] limma_3.4.4          RColorBrewer_1.0-2   Biobase_2.8.0       
> > [13] multicore_0.1-3     
> > 
> > loaded via a namespace (and not attached):
> >  [1] affyio_1.16.0         DBI_0.2-5             genefilter_1.30.0
>    
> >  [4] MASS_7.3-6            preprocessCore_1.10.0 pspline_1.0-14
>       
> >  [7] RSQLite_0.9-2         splines_2.11.1        survival_2.35-8
>      
> > [10] tcltk_2.11.1          xtable_1.5-6         
> > 
> > RTFMing only gives me the syntax of some functions in the multicore
> > package. How do I apply successfully this thing to my code?
> > 
> > Regards,
> > Edwin
> 
> 
> -- 
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
> 
> Location: M1-B861
> Telephone: 206 667-2793

Dr. Edwin Groot, postdoctoral associate
AG Laux
Institut fuer Biologie III
Schaenzlestr. 1
79104 Freiburg, Deutschland
+49 761-2032945



More information about the Bioconductor mailing list