[R] Thread parallelism and memory management on shared-memory supercomputers

Wed Dec 30 18:36:44 CET 2015

I've got allocations on a couple of shared memory supercomputers, which 
I use to run computationally-intensive scripts on multiple cores of the 
same node.  I've got 24 cores on the one, and 48 on the other.

In both cases, there is a hard memory limit, which is shared among the 
cores in the node.  In the latter, the limit is 255G. If my job requests 
more than that, the job gets aborted.

Now, I don't fully understand resource allocation in these sorts of 
systems.  But I do get that the sort of "thread parallelism" done by 
e.g. the `parallel` package in R isn't identical to the sort of 
parallelism commonly done in lower-level languages.  For example, when I 
request a node, I only ask for one of its cores.  My R script then 
detects the number of cores on the node, and farms out tasks to the 
cores via the `foreach` package.  My understanding is that lower-level 
languages need the number of cores to be specified in the shell script, 
and a particular job script is given directly to each worker.

My problem is that my parallel-calling R script is crashing the cluster, 
which terminates my script because the sum of the memory being requested 
by each thread is greater than what I'm allocated. I don't get this 
problem when running on my laptop's 4 cores, presumably because my 
laptop has a higher ratio of memory/core.

My question:  how can I ensure that the total memory being requested by 
N workers remains below a certain threshold?  Is this even possible?  If 
not, is it possible to benchmark a process locally, collecting the 
maximum per-worker memory requested, and use this to back out the number 
of workers that I can request for a given node's memory limit?

Thanks in advance!