[Rd] error in parallel:::sendMaster

Andreas Kersting r-deve| @end|ng |rom @ker@t|ng@de
Wed Nov 27 12:10:27 CET 2019


Hi,

I am facing a very weird problem with parallel::mclapply. I have a script which does some data wrangling on an input dataset in parallel and then writes the results to disk. I have been using this script daily for more than one year always on an EC2 instance launched from the same AMI (no updates installed after launch) and processed thousands of different input data sets successfully. I now have an input dataset for which I face the following bug:

The basic outline of the problematic section of the script:

# parts is a data.table with 88 rows
mc_ret <- parallel::mclapply(sample.int(nrow(parts)), function(i) {
  # do some data wrangling and write the result to a file
  # ...

  print(paste0("part ", i, " written successfully."))
  return(TRUE)
}, mc.preschedule = FALSE, mc.cores = 2L)

str(mc_ret)


Expected output: "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed once. All outputs are created successfully.

Actual output (see end of the message): "part i written successfully." is printed 88 times, once for each value of i. mc_ret is a list of length 88, each element being TRUE. Its structure is printed. All outputs are created successfully. So far so good.

But then "part i written successfully." it is printed another X times, for values of i for which it was already printed. This output is intermingled with X-1 times the following error message:

Error in sendMaster(try(eval(expr, env), silent = TRUE)) :
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster

and Y times the message "Execution halted". mc_ret is printed again, now being a list of length 85, with the first element being TRUE and all other elements being NULL. X and Y vary from run to run.


Now to the main problem: I tried very hard to create a reproducible example, but I failed. What I observed:
- The output is (and has always been) written to path which is on an NFS share. If I instead write to a path on a local disk it will work.
- The script is invoked using Rscript. If I instead source it from an interactive R session it works. There are at least two more people who have observed this: https://stackoverflow.com/questions/51986674/mclapply-sendmaster-error-only-with-rscript
- Before the call to mclapply the code acquires an exclusive file lock on a dedicated lock file, not written to but also on the NFS share. If I remove the code acquiring the lock, the whole script will also work if called using Rscript.
- The problem also occurs for mc.preschedule = TRUE.
- There is no error if I set mc.cores to 1.
- And stressing again: the code works without any changes from Rscript for thousands of other data sets.


Rscript -e "sessionInfo()":
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.2


I know this is a fairly old R version. I have not been able to reproduce the bug with a more recent version, but since it is so difficult to trigger, this does not mean much, I guess. I have looked through the changes made to the code of mclapply since that version and could not find something directly related. I am not even sure if it is a problem in the parallel package or some other (memory) bug. What strikes me is that others have observed a very similar error when using Rscript but not when using an interactive R session, just like I do.

I am not expecting a fix based on the information I provide, but maybe someone has some thoughts on this!?

Regards,
Andreas




Actual output:

[1] "part 51 written successfully."
[1] "part 30 written successfully."
[1] "part 32 written successfully."
[1] "part 48 written successfully."
[1] "part 63 written successfully."
[1] "part 5 written successfully."
[1][1] "part 14 written successfully." "part 18 written successfully."

[1] "part 38 written successfully."
[1] "part 11 written successfully."
[1] "part 68 written successfully."
[1] "part 45 written successfully."
[1] "part 88 written successfully."
[1] "part 36 written successfully."
[1] "part 44 written successfully."
[1] "part 55 written successfully."
[1] "part 26 written successfully."
[1] "part 37 written successfully."
[1] "part 22 written successfully."
[1] "part 13 written successfully."
[1] "part 67 written successfully."
[1] "part 10 written successfully."
[1] "part 24 written successfully."
[1] "part 20 written successfully."
[1] "part 74 written successfully."
[1] "part 50 written successfully."
[1] "part 2 written successfully."
[1] "part 17 written successfully."
[1] "part 66 written successfully."
[1] "part 61 written successfully."
[1][1] "part 7 written successfully." "part 31 written successfully."

[1] "part 70 written successfully."
[1] "part 71 written successfully."
[1][1] "part 76 written successfully."
 "part 42 written successfully."
[1][1] "part 72 written successfully." "part 12 written successfully."

[1] "part 39 written successfully."
[1] "part 58 written successfully."
[1] "part 87 written successfully."
[1] "part 81 written successfully."
[1] "part 3 written successfully."
[1] "part 29 written successfully."
[1] "part 82 written successfully."
[1] "part 54 written successfully."
[1] "part 23 written successfully."
[1] "part 56 written successfully."
[1] "part 73 written successfully."
[1] "part 27 written successfully."
[1] "part 64 written successfully."
[1] "part 46 written successfully."
[1] "part 80 written successfully."
[1] "part 53 written successfully."
[1] "part 43 written successfully."
[1] "part 47 written successfully."
[1] "part 57 written successfully."
[1] "part 75 written successfully."
[1] "part 84 written successfully."
[1] "part 85 written successfully."
[1][1] "part 86 written successfully."
 "part 19 written successfully."
[1][1] "part 33 written successfully." "part 34 written successfully."

[1] "part 21 written successfully."
[1] "part 40 written successfully."
[1][1] "part 4 written successfully."
 "part 28 written successfully."
[1] "part 62 written successfully."
[1] "part 16 written successfully."
[1] "part 79 written successfully."
[1] "part 6 written successfully."
[1] "part 77 written successfully."
[1] "part 65 written successfully."
[1] "part 60 written successfully."
[1] "part 78 written successfully."
[1] "part 1 written successfully."
[1] "part 83 written successfully."
[1] "part 9 written successfully."
[1] "part 15 written successfully."
[1][1] "part 41 written successfully."
 "part 35 written successfully."
[1][1] "part 8 written successfully." "part 69 written successfully."

[1] "part 59 written successfully."
[1] "part 25 written successfully."
[1] "part 49 written successfully."
[1] "part 52 written successfully."
List of 88
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
 $ : logi TRUE
[1] "part 31 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> lapply -> FUN -> mcparallel -> sendMaster
[1] "part 71 written successfully."
[1] "part 35 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 42 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 4 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1][1] "part 30 written successfully." "part 40 written successfully."

Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 41 written successfully."
[1] "part 80 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 45 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 82 written successfully."
[1] "part 54 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 59 written successfully."
Execution halted
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 27 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 36 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 44 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 83 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 61 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 70 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 66 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 10 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 53 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 55 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 15 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 50 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 5 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 48 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 32 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 69 written successfully."
Execution halted
[1] "part 52 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 77 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 24 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 56 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 6 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 62 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 33 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 16 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 34 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 74 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 2 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 19 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
[1] "part 9 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 11 written successfully."
[1] "part 28 written successfully."
[1] "part 29 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 17 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 65 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
Execution halted
Execution halted
[1] "part 68 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 43 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 38 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 1 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 57 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 8 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1] "part 25 written successfully."
Error in sendMaster(try(eval(expr, env), silent = TRUE)) : 
  write error, closing pipe to the master
Calls: lapply ... <Anonymous> -> <Anonymous> -> mcparallel -> sendMaster
[1]List of 85
 $ : logi TRUE
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL
 $ : NULL


More information about the R-devel mailing list