[R] Collecting children from mclapply

p_connolly p_connolly at slingshot.co.nz
Fri Aug 16 04:40:13 CEST 2013


I have an 8 core machine and I wish to use 6 of them to run a task
much more complicated than this example.

multitest <- function(n = 1000, lam = 500)
{
### Purpose:- Simple parallel task to check why mclapply doesn't work
### 
----------------------------------------------------------------------
### Modified from:-
### 
----------------------------------------------------------------------
### Arguments:- n; lam
### 
----------------------------------------------------------------------
### Author:-   Patrick Connolly, Date:- 16 Aug 2013, 09:27
### 
----------------------------------------------------------------------
### Revisions:-

   require(parallel)
   cat(paste(" ", Sys.time()," Begin using multicore method with phony 
data.\n"))
   lrs <- c("COB", "CNJ") #
   subsets <- c(outer(c("I", "M", "A"), lrs, paste, sep = ".")) # 6 
useful names
   lr.test <- function(lr.id, ...){ # function to use with mclapply
     cat(lr.id, "uses", system("echo $PPID", intern = TRUE), "\n")# 
useful sometimes
     fileName <- paste(lr.id, ".pdf", sep = "")
     pdf(file = fileName)
     on.exit(dev.off())
     aa <- rnorm(n)
     bb <- rpois(n, lam)
     plot(aa, bb)
     title(lr.id)
     data.frame(aa, bb)
   }
   ## Use multicore apply for all 6 simultaneously
   out <- mclapply(subsets, FUN = lr.test, n = n, lam = lam,
                   mc.cores = 6, mc.silent = FALSE, mc.preschedule = 
TRUE)
   names(out) <- subsets
   cat(ppaste(" ", Sys.time()," \n....Completed using multicore 
method.\n"))
   out
}

That simplified example works fine, producing 6 PDFs and returning a
list of 6 dataframes.  However, when I use my real function which is
much more demanding, only the last element of subsets to complete is
successful with errors like this in the other 5.

   simpleError in socketConnection("localhost", port = port, server =
   TRUE, blocking = TRUE, open = "a+b", timeout = timeout): cannot open
   the connection>

The top of a call to traceback looks like this:

   4: selectChildren(pids, 0.5)
   3: mccollect(jobs)
   2: mclapply(subsets, FUN =  .....

I gather this comes from the call to socketConnection() which seems to
be having a problem with blocking which has been set to TRUE, or maybe
there's an issue with encoding.  The very same code used to work
before R-3.0.0 so I'm led to think something in that area is working
differently from before.

My suspicion is that the problem arises before getting to the
mccollect stage since each call to FUN starts 8 additional R processes
over which there is no control so they don't return anything.  That
seems to be not only untidy and wasteful but I can't get anywhere
working out what causes that to happen.

When I've had problems with functions used with mclapply calls
previously, it was simple enough to restrict my subsets to length 1 so
I could use the ever useful browser() function and track down the
cause.  That's no use in this case because no problem arises if only
one core is used so there's no problem in FUN to be identified.

Ideas on where I should be looking are welcome.

TIA
Patrick



More information about the R-help mailing list