[R] Multiple CPU HowTo in Linux?

Edwin Groot edwin.groot at biologie.uni-freiburg.de
Wed Sep 15 09:49:50 CEST 2010


Hello all,
Thanks for your input, and helping to clear things up on where to go.
I will try out the multicore package and see if there are further
bottlenecks. It looks like some loops might need special treatment with
parallelization.
I have been pampered with the excellent walk-through vignettes of the
packages I have used so far. The HPC package guides lacked something in
the practical aspects of their usage.
lapply <- mclapply at the beginning of my script? Well, I never would
have thought of such a thing. Thanks!

I might be back on the list when I run out of physical RAM ;-)

Edwin
-- 
On Tue, 14 Sep 2010 12:11:17 -0700
 Martin Morgan <mtmorgan at fhcrc.org> wrote:
> On 09/14/2010 08:36 AM, Christian Raschke wrote:
> > Edwin,
> > 
> > I'm not sure what you mean by "adapting"; other than installing
> > multicore, there is nothing else to set up. How and whether you
> could
> > then parallelise your code strongly depends on the specific problem
> you
> > are facing.
> > 
> > What have done in the past was to look at the source of the
> functions
> > from whatever package I was using that produced the bottleneck. If
> what
> > is taking the longest time is actually embarrassingly parallel,
> > mclapply() from package multicore can help. In the simplest case
> you
> > could simply replace lapply() in the with an appropriate
> mclapply().
> > Check out ?mclapply. But then again you might have to get a little
> more
> > creative, depending on exactly what in the code is taking so long
> to
> > run. If your problem is inherently sequential then even multicore
> won't
> > help.
> > 
> > Christian
> > 
> > On 09/14/2010 09:35 AM, Edwin Groot wrote:
> >> Hello Cedrick,
> >> Ah, yes, that looks like it would apply to my situation. I was
> >> previously reading on snow, which is tailored for clusters, rather
> than
> >> a single desktop computer.
> >> Anyone with experience adapting multicore to an R-script?
> >> I have to admit I know little about parallel processing,
> >> multiprocessing and cluster processing.
> >>
> >> Edwin
> >>
> >> On Tue, 14 Sep 2010 10:15:42 -0400
> >>   "Johnson, Cedrick W."<cedrick at cedrickjohnson.com>  wrote:
> >>   
> >>>    ?multicore perhaps
> >>>
> >>> On 09/14/2010 10:01 AM, Edwin Groot wrote:
> >>>     
> >>>> Hello all,
> >>>> I upgraded my R workstation, and to my dismay, only one core
> >>>>        
> >>> appears to
> >>>     
> >>>> be used during intensive computation of a bioconductor function.
> 
> Hi Edwin -- Since you have a Bioconductor package,  you might ask on
> the
> Bioconductor list, as the authors of some computationally intensive
> tasks have provided facilities for relatively transparent use of,
> e.g.,
> multicore or Rmpi. In ShortRead, for instance, loading multicore is
> enough to distribute some tasks across cores, and the srapply
> function
> can help (or not; things might be as easy as lapply <- mclapply at
> the
> top of your script) with your own lapply-like code.
> 
> http://bioconductor.org/help/mailing-list/
> 
> Martin
> >>>> What I have now is two dual-core Xeon 5160 CPUs and 10 GB RAM.
> When
> >>>>        
> >>> I
> >>>     
> >>>> fully load it, top reports about 25% user, 75% idle and 0.98
> >>>>        
> >>> short-term
> >>>     
> >>>> load.
> >>>> The archives gave nothing helpful besides mention of snow. I
> >>>>        
> >>> thought of
> >>>     
> >>>> posting to HPC, but this system is fairly modest WRT processing
> >>>>        
> >>> power.
> >>>     
> >>>> Any pointers of where to start?
> >>>> ---
> >>>> #Not running anything at the moment
> >>>>       
> >>>>> sessionInfo()
> >>>>>          
> >>>> R version 2.11.1 (2010-05-31)
> >>>> x86_64-pc-linux-gnu
> >>>>
> >>>> locale:
> >>>>    [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
> >>>>    [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
> >>>>    [5] LC_MONETARY=C              LC_MESSAGES=en_GB.UTF-8
> >>>>    [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
> >>>>    [9] LC_ADDRESS=C               LC_TELEPHONE=C
> >>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
> >>>>
> >>>> attached base packages:
> >>>> [1] stats     graphics  grDevices utils     datasets  methods
> >>>>        
> >>>    base
> >>>     
> >>>>
> >>>> loaded via a namespace (and not attached):
> >>>> [1] tools_2.11.1
> >>>> ---
> >>>> $ uname -a
> >>>> Linux laux29 2.6.26-2-amd64 #1 SMP Sun Jun 20 20:16:30 UTC 2010
> >>>>        
> >>> x86_64
> >>>     
> >>>> GNU/Linux
> >>>> ---
> >>>> Thanks for your help,
> >>>> Edwin
> >>>>        
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible
> code.
> >>>      
> >> Dr. Edwin Groot, postdoctoral associate
> >> AG Laux
> >> Institut fuer Biologie III
> >> Schaenzlestr. 1
> >> 79104 Freiburg, Deutschland
> >> +49 761-2032945
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>    
> > 
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Dr. Edwin Groot, postdoctoral associate
AG Laux
Institut fuer Biologie III
Schaenzlestr. 1
79104 Freiburg, Deutschland
+49 761-2032945



More information about the R-help mailing list