[Rd] SUGGESTION: Add get/setCores() to 'parallel' (and command line option --max-cores)

Simon Urbanek simon.urbanek at r-project.org
Sun Dec 16 04:58:34 CET 2012


On Dec 15, 2012, at 7:38 PM, Norm Matloff wrote:

> Henrik Bengtsson <hb at biostat.ucsf.edu> wrote:
> 
> ^ In the 'parallel' package there is detectCores(), which tries its best
> ^ to infer the number of cores on the current machine.  This is useful
> ^ if you wish to utilize the *maximum* number of cores on the machine.
> ^ Several are using this to set the number of cores when parallelizing,
> ^ sometimes also hardcoded within 3rd-party scripts/package code, but
> ^ there are several settings where you wish to use fewer, e.g. in a
> ^ compute cluster where you R session is given only a portion of the
> ^ cores available.  Because of this, I'd like to propose to add
> ^ getCores(), which by default returns what detectCores() gives, but can
> 
> Even if one has the entire machine to oneself, there is often another
> very good reason not to use the maximum number of cores:  Using the
> maximum number of cores may reduce performance.  This is true in
> general, and sometimes especially true when the inferred number of cores
> includes hyperthreading.
> 

Actually, the converse is often true (it depends on the machine architecture, though - I'm assuming true SMP machines here) -- often it is beneficial to run more threads than cores because the time spent waiting for access outside the CPU can be used by other thread that can continue computing. This is in particular true for parallel because of the setup overhead -- typically the real problem is memory, though. That said, the balance is heavily machine and task dependent so any default will be bad for some cases. Typically, for commodity machines with couple dozen cores it's good to overload, for bigger machines it's bad.

Cheers,
Simon



More information about the R-devel mailing list