[R] inconsistency in mclapply.....

akshay kulkarni @k@h@y_e4 @end|ng |rom hotm@||@com
Sat Jun 10 10:42:37 CEST 2023


Dear Ivan,
                 here is the comprehensive info you requested:

THis is the output of top when I run a function LOWn() with mclapply in it. It executes succesfully. (the number of cores in my machine is 2)

> LOWn(OHLCDataEP[[63]])

Tasks: 127 total,   3 running, 124 sleeping,   0 stopped,   0 zombie
%Cpu0  : 82.3 us, 16.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  1.0 hi,  0.0 si,  0.0 st
%Cpu1  : 74.1 us, 24.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  1.0 hi,  0.0 si,  0.0 st
MiB Mem :  15531.8 total,  11019.4 free,   3521.8 used,    990.6 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  11723.8 avail Mem






This is the output of top when I run function LOWp() with mclapply also in it. it hangs:


top - 07:48:08 up 54 min,  2 users,  load average: 0.02, 0.36, 0.34
Tasks: 127 total,   1 running, 126 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15531.8 total,  10976.8 free,   3564.4 used,    990.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  11681.2 avail Mem

The mcalpply call only works the first time when I call it after starting an R session


TRaceback of after interrupting LOWp:


> LOWp(OHLCDataEP[[63]])
^C
There were 50 or more warnings (use warnings() to see the first 50)
> traceback()
3: selectChildren(jobs[!is.na(jobsp)], -1)
2: mclapply(LYGH, FUN = arfima, mc.cores = 2, mc.preschedule = FALSE) at <tmp>#26
1: LOWp(OHLCDataEP[[63]])


I think child processes spawned by maclapply in FUN2 doesn't get killed...THis is from the top command AFTER interrupting FUN2 (sometimes there is only one R process)

 38615 ec2-user  20   0 1016432 400020  13392 S   0.0   2.5   0:02.05 R
38696 ec2-user  20   0 1016436 400416  13676 S   0.0   2.5   0:02.03 R

THis is the output when FUN2 is running:
 1526 ec2-user  20   0 1525784 651628  23040 S   0.0   4.1   0:11.90 R
 2616 ec2-user  20   0 1525784 634688   6092 S   0.0   4.0   0:00.03 R
   2617 ec2-user  20   0 1525784 634884   6288 S   0.0   4.0   0:00.02 R



THis is AFTER succesful completion of FUN1:
 38615 ec2-user  20   0 1016432 400020  13392 S   0.0   2.5   0:02.05 R
 38696 ec2-user  20   0 1016436 400416  13676 S   0.0   2.5   0:02.03 R

Please note that PIDs are same between FUN1 and FUN2, and also that when I am not parallelising there is only one R process:
1526 ec2-user  20   0 1227788 491248  21368 S   0.0   3.1   0:02.95 R



> sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.6 (Ootpa)

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblaso-r0.3.15.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] imputeTS_3.3    pbmcapply_1.5.1 attempt_0.3.1   forecast_8.21

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10       urca_1.3-3        pillar_1.9.0      compiler_4.2.1
 [5] tseries_0.10-54   xts_0.13.1        lifecycle_1.0.3   tibble_3.2.1
 [9] gtable_0.3.3      nlme_3.1-157      lattice_0.20-45   pkgconfig_2.0.3
[13] rlang_1.1.1       cli_3.6.1         curl_5.0.0        xml2_1.3.4
[17] generics_0.1.3    vctrs_0.6.2       lmtest_0.9-40     grid_4.2.1
[21] nnet_7.3-17       ggtext_0.1.2      gridtext_0.1.5    glue_1.6.2
[25] R6_2.5.1          fansi_1.0.4       ggplot2_3.4.2     TTR_0.24.3
[29] magrittr_2.0.3    scales_1.2.1      quantmod_0.4.22   timeDate_4022.108
[33] colorspace_2.1-0  fracdiff_1.5-2    quadprog_1.5-8    utf8_1.2.3
[37] stinepack_1.4     munsell_0.5.0     zoo_1.8-12


THis is the output of jobs -l: (it doesn't do anything)

[ec2-user using ip-172-31-15-116 ~]$ jobs -l
[ec2-user using ip-172-31-15-116 ~]$

killall - SIGCONT  R has no effect


You had asked me to attach a debugger to the child processes. How do you get the child processes spawned by mclapply? For example, how do i identify, among the listed R processes above, the child processes?


Many thanks in advance....

Thanking you,
Yours sincerely,
AKSHAY M KULKARNI








________________________________
From: Ivan Krylov <krylov.r00t using gmail.com>
Sent: Saturday, June 10, 2023 12:54 PM
To: akshay kulkarni <akshay_e4 using hotmail.com>
Cc: R help Mailing list <r-help using r-project.org>
Subject: Re: [R] inconsistency in mclapply.....

On Fri, 9 Jun 2023 21:19:11 +0000
akshay kulkarni <akshay_e4 using hotmail.com> wrote:

> debug at <tmp>#26: LYG <- mclapply(LYGH, FUN = arfima, mc.cores = 2,
> mc.preschedule = FALSE)
> Browse[2]> LYG <- pbmclapply(LYGH,FUN = arfima,mc.cores =
> 2,mc.preschedule = FALSE)
> |                                                  | 0%, ETA NA

So if you interrupt the code _after_ it hangs at 0%, ETA NA, what's the
traceback? (We're doing this to confirm that the parent process hangs
in either selectChildren() or readChild().)

> You might be interested in this:
>
> [ec2-user using ip-172-31-15-116 ~]$ exit
> logout
> There are stopped jobs.
>
> THis occurs when I close R and try to exit the shell prompt( I am on
> an AWS EC2 RHEL 8 Instance). Can this lead you somewhere?

I guess this proves the existence of child processes, probably spawned
by mclapply, but why would they be _stopped_, I don't know. What's the
output of jobs -l at this point?

(This suggests trying to send them a SIGCONT and seeing what happens.
Does mclapply() get unstuck if you run the command killall -SIGCONT R
from a separate ssh connection? Would be strange if it worked, but
worth a try.)

> As of now I  have quit R in my machine, so I can't get session
> info..

Knowing the output of sessionInfo() could still be useful for solving
the problem. It's best to show the output after loading all the
packages, ideally just before you reproduce the problem.

> by the by, how do you run top when running R? I think at least in my
> machine, you have to quit R to get to the shell prompt...

I can think of 3 options:

1) Type Ctrl+Z at the R prompt. R (and the rest of the process group, I
think) becomes suspended, you return to the command line prompt where
you can run other commands. At the system command line prompt, type
"fg" and press Enter in order to continue running R. (Press Enter a
second time so that R prints its command line prompt again.) This is
quick, doesn't require preparation, but messes up the state of the
processes you're interested in. (They become suspended instead of
running, which may complicate debugging.)

2) Open a second ssh connection to the same machine the same way you
had opened the first one. You won't be able to (easily) interact with
the R session running in the first connection, but you'll get a second
system command line where you'll be able to run top, gdb, and other
commands, which should let you inspect the state of the system.

3) Before starting R, install a "terminal multiplexer", that is, GNU
Screen or tmux. If you're still on RHEL, use sudo dnf install screen or
sudo dnf install tmux. One of these commands needs to be run once per
computer. Type the name of the program ("screen" or "tmux") to start it.
Inside screen/tmux, start R.

In Screen, use Ctrl+A then C in order to create a new virtual terminal;
Ctrl+A then type the number in order to switch between terminals. In
tmux, use Ctrl+B then C in order to create a new virtual terminal;
Ctrl+B then type the number in order to switch between them. Terminals
are numbered from 0 upwards, so the second one will be numbered 1, and
so on.

--
Best regards,
Ivan

	[[alternative HTML version deleted]]



More information about the R-help mailing list