[Rd] Random behavior of mclapply

Tomas Kalibera tom@@@k@liber@ @ending from gm@il@com
Thu Oct 18 16:47:59 CEST 2018


Hi Thibault,

mclapply has been designed to signal an error in two ways. User code 
errors are returned as special objects (of class "try-error") in the 
respective element of the result list. All other errors (including a 
process killed) are returned as NULL in the respective elements of the 
result list. To detect these errors reliably, one needs to implement FUN 
so that it never returns NULL normally (also it cannot return a raw 
vector). This is how mclapply was designed and implemented (and also 
mccollect, etc). It may be surprising to see multiple NULL elements when 
a single process is killed, but this is expected with pre-scheduling 
when that process has been tasked to compute multiple elements.

To make this API more user friendly, I've added a warning that is now 
emitted when a job does not deliver a result (that is, when a vector 
element is NULL because of such error). I've also made it more explicit 
in the documentation that NULL signals an error.

Best,
Tomas


On 07/26/2018 08:37 PM, Thibault Vatter wrote:
> Hi,
>
> I wondered about the behavior described in the following stackoverflow
> question:
>
> https://stackoverflow.com/questions/20674538/mclapply-returns-null-randomly
>
> More specifically, I would like to know if you ever considered the
> suggestion made in the comments of the first answer, namely to somehow warn
> the user if one of the processes has been killed by the out-of-memory
> killer ?
>
> I am always surprised to see the random NULLs without message/warning/error
> of any kind, and I think that it could be a useful feature to know whether
> the function executed by mclapply returned a NULL or if the process was
> killed for some reason.
>
> In the following gist, I have an example of this (in this case non-random)
> behavior:
>
> https://gist.github.com/tvatter/2fcf3a9a99c256f9b9360f596b300715
>
> For the record, I generate the list of NULLs in the 4th mclapply in the
> girst above with a late 2013 macbook pro with macOS High Sierra, 16GB of
> memory, and my sessionInfo() is:
>
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-apple-darwin16.7.0 (64-bit)
> Running under: macOS High Sierra 10.13.6
>
> Matrix products: default
> BLAS:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
> LAPACK:
> /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.0 tools_3.5.0    yaml_2.1.19
>
> ------------------------------------------------------------
> Thibault Vatter
> Department of Statistics
> Columbia University
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list