[Rd] Memory limitations for parallel::mclapply

Ista Zahn istazahn at gmail.com
Fri Jul 24 23:49:14 CEST 2015


Hi Josh,

I think we need some more details, including code, and information
about your operating system. My machine has only 12 Gb of ram, but I
can run this quite comfortably (no swap, other processes using memory
etc.):

library(parallel)
library(data.table)
d <- data.table(a = rnorm(50000000),
                b = runif(1:50000000),
                c = sample(letters, 50000000, replace = TRUE),
                d = 1:50000000,
                g = rep(letters[1:10], each = 5000000))

system.time(means <- mclapply(unique(d$g), function(x) sapply(d[g==x,
list(a, b, d)], mean), mc.cores = 5))

In other words, I don't think there is anything inherent the the kind
of operation you describe that requires the large data object to be
copied. So as usual the devil is in the details, which you haven't yet
described.

Best,
Ista


On Fri, Jul 24, 2015 at 4:21 PM, Joshua Bradley <jgbradley1 at gmail.com> wrote:
> Hello,
>
> I have been having issues using parallel::mclapply in a memory-efficient
> way and would like some guidance. I am using a 40 core machine with 96 GB
> of RAM. I've tried to run mclapply with 20, 30, and 40 mc.cores and it has
> practically brought the machine to a standstill each time to the point
> where I do a hard reset.
>
> When running mclapply with 10 mc.cores, I can see that each process takes
> 7.4% (~7 GB) of memory. My use-case for mclapply is the following: run
> mclapply over a list of 150000 names, for each process I refer to a larger
> pre-computed data.table to compute some stats with the name, and return
> those stats . Ideally I want to use the large data.table as shared-memory
> but the number of mc.cores I can use are being limited because each one
> requires 7 GB. Someone posted this exact same issue
> <http://stackoverflow.com/questions/13942202/r-and-shared-memory-for-parallelmclapply>
> on
> stackoverflow a couple years ago but it never got answered.
>
> Do I have to manually tell mclapply to use shared memory (if so, how?)? Is
> this type of job better with the doParallel package and foreach approach?
>
> Josh Bradley
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list