[Rd] mclapply returns NULLs on MacOS when running GAM

Henrik Bengtsson henr|k@bengt@@on @end|ng |rom gm@||@com
Tue Apr 28 18:08:24 CEST 2020


Hi, a few comments below.

First, from my experience and troubleshooting similar reports from
others, a returned NULL from parallel::mclapply() is often because the
corresponding child process crashed/died. However, when this happens
you should see a warning, e.g.

> y <- parallel::mclapply(1:2, FUN = function(x) if (x == 2) quit("no") else x)
Warning message:
In parallel::mclapply(1:2, FUN = function(x) if (x == 2) quit("no") else x) :
  scheduled core 2 did not deliver a result, all values of the job
will be affected
> str(y)
List of 2
 $ : int 1
 $ : NULL

This warning is produces on R 4.0.0 and R 3.6.2 in Linux, but I would
assume that warning is also produced on macOS.  It's not clear from
you message whether you also got that warning or not.

Second, forked processing, as used by parallel::mclapply(), is advised
against when using the RStudio Console [0].  Unfortunately, there's no
way to disable forked processing in R [1].  You could add the
following to your ~/.Rprofile startup file:

## Warn when forked processing is used in the RStudio Console
if (Sys.getenv("RSTUDIO") == "1" && !nzchar(Sys.getenv("RSTUDIO_TERM"))) {
  invisible(trace(parallel:::mcfork, tracer =
quote(warning("parallel::mcfork() was used. Note that forked
processes, e.g. parallel::mclapply(), may be unstable when used from
the RStudio Console
[https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011]",
call.=FALSE))))
}

to detect when forked processed is used in the RStudio Console -
either by you or by some package code that you use directly or
indirectly.  You could even use stop() here if you wanna be
conservative.

[0] https://github.com/rstudio/rstudio/issues/2597#issuecomment-482187011
[1] https://stat.ethz.ch/pipermail/r-devel/2020-January/078896.html

/Henrik

On Tue, Apr 28, 2020 at 2:39 AM Shian Su <su.s using wehi.edu.au> wrote:
>
> Yes I am running on Rstudio 1.2.5033. I was also running this code without error on Ubuntu in Rstudio. Checking again on the terminal and it does indeed work fine even with large data.frames.
>
> Any idea as to what interaction between Rstudio and mclapply causes this?
>
> Thanks,
> Shian
>
> On 28 Apr 2020, at 7:29 pm, Simon Urbanek <simon.urbanek using R-project.org<mailto:simon.urbanek using R-project.org>> wrote:
>
> Sorry, the code works perfectly fine for me in R even for 1e6 observations (but I was testing with R 4.0.0). Are you using some kind of GUI?
>
> Cheers,
> Simon
>
>
> On 28/04/2020, at 8:11 PM, Shian Su <su.s using wehi.edu.au<mailto:su.s using wehi.edu.au>> wrote:
>
> Dear R-devel,
>
> I am experiencing issues with running GAM models using mclapply, it fails to return any values if the data input becomes large. For example here the code runs fine with a df of 100 rows, but fails at 1000.
>
> library(mgcv)
> library(parallel)
>
> df <- data.frame(
> +     x = 1:100,
> +     y = 1:100
> + )
>
> mclapply(1:2, function(i, df) {
> +         fit <- gam(y ~ s(x, bs = "cs"), data = df)
> +     },
> +     df = df,
> +     mc.cores = 2L
> + )
> [[1]]
>
> Family: gaussian
> Link function: identity
>
> Formula:
> y ~ s(x, bs = "cs")
>
> Estimated degrees of freedom:
> 9  total = 10
>
> GCV score: 0
>
> [[2]]
>
> Family: gaussian
> Link function: identity
>
> Formula:
> y ~ s(x, bs = "cs")
>
> Estimated degrees of freedom:
> 9  total = 10
>
> GCV score: 0
>
>
>
> df <- data.frame(
> +     x = 1:1000,
> +     y = 1:1000
> + )
>
> mclapply(1:2, function(i, df) {
> +         fit <- gam(y ~ s(x, bs = "cs"), data = df)
> +     },
> +     df = df,
> +     mc.cores = 2L
> + )
> [[1]]
> NULL
>
> [[2]]
> NULL
>
> There is no error message returned, and the code runs perfectly fine in lapply.
>
> I am on a MacBook 15 (2016) running MacOS 10.14.6 (Mojave) and R version 3.6.2. This bug could not be reproduced on my Ubuntu 19.10 running R 3.6.1.
>
> Kind regards,
> Shian Su
> ----
> Shian Su
> PhD Student, Ritchie Lab 6W, Epigenetics and Development
> Walter & Eliza Hall Institute of Medical Research
> 1G Royal Parade, Parkville VIC 3052, Australia
>
>
> _______________________________________________
>
> The information in this email is confidential and inte...{{dropped:6}}



More information about the R-devel mailing list