[Rd] [External] Re: .Internal(quit(...)): system call failed: Cannot allocate memory

Jan Gorecki j@goreck| @end|ng |rom w|t@edu@p|
Tue Dec 1 18:22:54 CET 2020


Yes, I do set outside of R, in shell:
R_MAX_VSIZE=100Gb SRC_DATANAME=G1_1e9_2e0_0_0 /usr/bin/time -v Rscript
datatable/groupby-datatable.R

I think it might be related to allocations made with malloc rather than R_alloc.
Probably malloc allocation is not capped by setting this env var.
If so, then I have to limit memory on OS/shell level. As you mentioned before.

Best

On Tue, Dec 1, 2020 at 6:54 PM <luke-tierney using uiowa.edu> wrote:
>
> The fact that your max resident size isn't affected looks odd.  Are
> you setting the environment variable outside R? When I run
>
>      env R_MAX_VSIZE=16Gb /usr/bin/time bin/Rscript jg.R 1e9 2e0 0 0
>
> (your code in jg.R). I get a quick failure with 11785524maxresident)k
>
> Best,
>
> luke
>
> On Tue, 1 Dec 2020, Jan Gorecki wrote:
>
> > Thank you Luke,
> >
> > I tried your suggestion about R_MAX_VSIZE but I am not able to get the
> > error you are getting.
> > I tried recent R devel as I have seen you made a change to GC there.
> > My machine is 128GB, free -h reports 125GB available. I tried to set
> > 128, 125 and 100. In all cases the result is "Command terminated by
> > signal 9". Each took around 6-6.5h.
> > Details below, if it tells you anything how could I optimize it (or
> > raise an exception early) please do let me know.
> >
> > R 4.0.3
> >
> > unset R_MAX_VSIZE
> >        User time (seconds): 40447.92
> >        System time (seconds): 4034.37
> >        Percent of CPU this job got: 201%
> >        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:07:59
> >        Maximum resident set size (kbytes): 127261184
> >        Major (requiring I/O) page faults: 72441
> >        Minor (reclaiming a frame) page faults: 3315491751
> >        Voluntary context switches: 381446
> >        Involuntary context switches: 529554
> >        File system inputs: 108339200
> >        File system outputs: 120
> >
> > R-devel 2020-11-27 r79522
> >
> > unset R_MAX_VSIZE
> >        User time (seconds): 40713.52
> >        System time (seconds): 4039.52
> >        Percent of CPU this job got: 198%
> >        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:15:52
> >        Maximum resident set size (kbytes): 127254796
> >        Major (requiring I/O) page faults: 72810
> >        Minor (reclaiming a frame) page faults: 3433589848
> >        Voluntary context switches: 384363
> >        Involuntary context switches: 609024
> >        File system inputs: 108467064
> >        File system outputs: 112
> >
> > R_MAX_VSIZE=128Gb
> >        User time (seconds): 40411.13
> >        System time (seconds): 4227.99
> >        Percent of CPU this job got: 198%
> >        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:14:01
> >        Maximum resident set size (kbytes): 127249316
> >        Major (requiring I/O) page faults: 88500
> >        Minor (reclaiming a frame) page faults: 3544520527
> >        Voluntary context switches: 384117
> >        Involuntary context switches: 545397
> >        File system inputs: 111675896
> >        File system outputs: 120
> >
> > R_MAX_VSIZE=125Gb
> >        User time (seconds): 40246.83
> >        System time (seconds): 4042.76
> >        Percent of CPU this job got: 201%
> >        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:06:56
> >        Maximum resident set size (kbytes): 127254200
> >        Major (requiring I/O) page faults: 63867
> >        Minor (reclaiming a frame) page faults: 3449493803
> >        Voluntary context switches: 370753
> >        Involuntary context switches: 614607
> >        File system inputs: 106322880
> >        File system outputs: 112
> >
> > R_MAX_VSIZE=100Gb
> >        User time (seconds): 41837.10
> >        System time (seconds): 3979.57
> >        Percent of CPU this job got: 192%
> >        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:36:34
> >        Maximum resident set size (kbytes): 127256940
> >        Major (requiring I/O) page faults: 66829
> >        Minor (reclaiming a frame) page faults: 3357778594
> >        Voluntary context switches: 391149
> >        Involuntary context switches: 646410
> >        File system inputs: 106605648
> >        File system outputs: 120
> >
> > On Fri, Nov 27, 2020 at 10:18 PM <luke-tierney using uiowa.edu> wrote:
> >>
> >> On Thu, 26 Nov 2020, Jan Gorecki wrote:
> >>
> >>> Thank you Luke for looking into it. Your knowledge of gc is definitely
> >>> helpful here. I put comments inline below.
> >>>
> >>> Best,
> >>> Jan
> >>>
> >>> On Wed, Nov 25, 2020 at 10:38 PM <luke-tierney using uiowa.edu> wrote:
> >>>>
> >>>> On Tue, 24 Nov 2020, Jan Gorecki wrote:
> >>>>
> >>>>> As for other calls to system. I avoid calling system. In the past I
> >>>>> had some (to get memory stats from OS), but they were failing with
> >>>>> exactly the same issue. So yes, if I would add call to system before
> >>>>> calling quit, I believe it would fail with the same error.
> >>>>> At the same time I think (although I am not sure) that new allocations
> >>>>> made in R are working fine. So R seems to reserve some memory and can
> >>>>> continue to operate, while external call like system will fail. Maybe
> >>>>> it is like this by design, don't know.
> >>>>
> >>>> Thanks for the report on quit(). We're exploring how to make the
> >>>> cleanup on exit more robust to low memory situations like these.
> >>>>
> >>>>>
> >>>>> Aside from this problem that is easy to report due to the warning
> >>>>> message, I think that gc() is choking at the same time. I tried to
> >>>>> make reproducible example for that, multiple times but couldn't, let
> >>>>> me try one more time.
> >>>>> It happens to manifest when there is 4e8+ unique characters/factors in
> >>>>> an R session. I am able to reproduce it using data.table and dplyr
> >>>>> (0.8.4 because 1.0.0+ fails even sooner), but using base R is not easy
> >>>>> because of the size. I described briefly problem in:
> >>>>> https://github.com/h2oai/db-benchmark/issues/110
> >>>>
> >>>> Because of the design of R's character vectors, with each element
> >>>> allocated separately, R is never going to be great at handling huge
> >>>> numbers of distinct strings. But it can do an adequate job given
> >>>> enough memory to work with.
> >>>>
> >>>> When I run your GitHub issue example on a machine with around 500 Gb
> >>>> of RAM it seems to run OK; /usr/bin/time reports
> >>>>
> >>>> 2706.89user 161.89system 37:10.65elapsed 128%CPU (0avgtext+0avgdata 92180796maxresident)k
> >>>> 0inputs+103450552outputs (0major+38716351minor)pagefaults 0swaps
> >>>>
> >>>> So the memory footprint is quite large. Using gc.time() it looks like
> >>>> about 1/3 of the time is in GC. Not ideal, and maybe could be improved
> >>>> on a bit, but probably not by much. The GC is basically doing an
> >>>> adequate job, given enough RAM.
> >>>
> >>> Agree, 1/3 is a lot but still acceptable. So this strictly is not
> >>> something that requires intervention.
> >>> PS. I wasn't aware of gc.time(), it may be worth linking it from
> >>> SeeAlso in gc() manual.
> >>>
> >>>>
> >>>> If you run this example on a system without enough RAM, or with other
> >>>> programs competing for RAM, you are likely to end up fighting with
> >>>> your OS/hardware's virtual memory system. When I try to run it on a
> >>>> 16Gb system it churns for an hour or so before getting killed, and
> >>>> /usr/bin/time reports a huge number of page faults:
> >>>>
> >>>> 312523816inputs+0outputs (24761285major+25762068minor)pagefaults 0swaps
> >>>>
> >>>> You are probably experiencing something similar.
> >>>
> >>> Yes, this is exactly what I am experiencing.
> >>> The machine is a bare metal machine of 128GB mem, csv size 50GB,
> >>> data.frame size 74GB.
> >>> In my case it churns for ~3h before it gets killed with SIGINT from
> >>> the parent R process which uses 3h as a timeout for this script.
> >>> This is something I would like to be addressed because gc time is far
> >>> bigger than actual computation time. This is not really acceptable, I
> >>> would prefer to raise an exception instead.
> >>>
> >>>>
> >>>> There may be opportunities for more tuning of the GC to better handle
> >>>> running this close to memory limits, but I doubt the payoff would be
> >>>> worth the effort.
> >>>
> >>> If you don't have plans/time to work on that anytime soon, then I can
> >>> fill bugzilla for this problem so it won't get lost in the mailing
> >>> list.
> >>
> >> I'm not convinced anything useful can be done that would work well for
> >> your application without working badly for others.
> >>
> >> If you want to drive this close to your memory limits you are probably
> >> going to have to take responsibility for some tuning at your end. One
> >> option in ?Memory you might try is the R_MAX_VSIZE environment
> >> variable. On my 16Gb machine with R_MAX_VSIZE=16Gb your example fails
> >> very quickly with
> >>
> >> Error: vector memory exhausted (limit reached?)
> >>
> >> rather than churning for an hour trying to make things work. Setting
> >> memory and/or virtual memory limits in your shell is another option.
> >>
> >> Best,
> >>
> >> luke
> >>
> >>>
> >>>
> >>>>
> >>>> Best,
> >>>>
> >>>> luke
> >>>>
> >>>>> It would help if gcinfo() could take FALSE/TRUE/2L where 2L will print
> >>>>> even more information about gc, like how much time the each gc()
> >>>>> process took, how many objects it has to check on each level.
> >>>>>
> >>>>> Best regards,
> >>>>> Jan
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, Nov 24, 2020 at 1:05 PM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
> >>>>>>
> >>>>>> On 11/24/20 11:27 AM, Jan Gorecki wrote:
> >>>>>>> Thanks Bill for checking that.
> >>>>>>> It was my impression that warnings are raised from some internal
> >>>>>>> system calls made when quitting R. At that point I don't have much
> >>>>>>> control over checking the return status of those.
> >>>>>>> Your suggestion looks good to me.
> >>>>>>>
> >>>>>>> Tomas, do you think this could help? could this be implemented?
> >>>>>>
> >>>>>> I think this is a good suggestion. Deleting files on Unix was changed
> >>>>>> from system("rm") to doing that in C, and deleting the session directory
> >>>>>> should follow.
> >>>>>>
> >>>>>> It might also help diagnosing your problem, but I don't think it would
> >>>>>> solve it. If the diagnostics in R works fine and the OS was so
> >>>>>> hopelessly out of memory that it couldn't run any more external
> >>>>>> processes, then really this is not a problem of R, but of having
> >>>>>> exhausted the resources. And it would be a coincidence that just this
> >>>>>> particular call to "system" at the end of the session did not work.
> >>>>>> Anything else could break as well close to the end of the script. This
> >>>>>> seems the most likely explanation to me.
> >>>>>>
> >>>>>> Do you get this warning repeatedly, reproducibly at least in slightly
> >>>>>> different scripts at the very end, with this warning always from quit()?
> >>>>>> So that the "call" part of the warning message has .Internal(quit) like
> >>>>>> in the case you posted? Would adding another call to "system" before the
> >>>>>> call to "q()" work - with checking the return value? If it is always
> >>>>>> only the last call to "system" in "q()", then it is suspicious, perhaps
> >>>>>> an indication that some diagnostics in R is not correct. In that case, a
> >>>>>> reproducible example would be the key - so either if you could diagnose
> >>>>>> on your end what is the problem, or create a reproducible example that
> >>>>>> someone else can use to reproduce and debug.
> >>>>>>
> >>>>>> Best
> >>>>>> Tomas
> >>>>>>
> >>>>>>>
> >>>>>>> On Mon, Nov 23, 2020 at 7:10 PM Bill Dunlap <williamwdunlap using gmail.com> wrote:
> >>>>>>>> The call to system() probably is an internal call used to delete the session's tempdir().  This sort of failure means that a potentially large amount of disk space is not being recovered when R is done.  Perhaps R_CleanTempDir() could call R_unlink() instead of having a subprocess call 'rm -rf ...'.  Then it could also issue a specific warning if it was impossible to delete all of tempdir().  (That should be very rare.)
> >>>>>>>>
> >>>>>>>>> q("no")
> >>>>>>>> Breakpoint 1, R_system (command=command using entry=0x7fffffffa1e0 "rm -Rf /tmp/RtmppoKPXb") at sysutils.c:311
> >>>>>>>> 311     {
> >>>>>>>> (gdb) where
> >>>>>>>> #0  R_system (command=command using entry=0x7fffffffa1e0 "rm -Rf /tmp/RtmppoKPXb") at sysutils.c:311
> >>>>>>>> #1  0x00005555557c30ec in R_CleanTempDir () at sys-std.c:1178
> >>>>>>>> #2  0x00005555557c31d7 in Rstd_CleanUp (saveact=<optimized out>, status=0, runLast=<optimized out>) at sys-std.c:1243
> >>>>>>>> #3  0x00005555557c593d in R_CleanUp (saveact=saveact using entry=SA_NOSAVE, status=status using entry=0, runLast=<optimized out>) at system.c:87
> >>>>>>>> #4  0x00005555556cc85e in do_quit (call=<optimized out>, op=<optimized out>, args=0x555557813f90, rho=<optimized out>) at main.c:1393
> >>>>>>>>
> >>>>>>>> -Bill
> >>>>>>>>
> >>>>>>>> On Mon, Nov 23, 2020 at 3:15 AM Tomas Kalibera <tomas.kalibera using gmail.com> wrote:
> >>>>>>>>> On 11/21/20 6:51 PM, Jan Gorecki wrote:
> >>>>>>>>>> Dear R-developers,
> >>>>>>>>>>
> >>>>>>>>>> Some of the more fat scripts (50+ GB mem used by R) that I am running,
> >>>>>>>>>> when they finish they do quit with q("no", status=0)
> >>>>>>>>>> Quite often it happens that there is an extra stderr output produced
> >>>>>>>>>> at the very end which looks like this:
> >>>>>>>>>>
> >>>>>>>>>> Warning message:
> >>>>>>>>>> In .Internal(quit(save, status, runLast)) :
> >>>>>>>>>>     system call failed: Cannot allocate memory
> >>>>>>>>>>
> >>>>>>>>>> Is there any way to avoid this kind of warnings? I am using stderr
> >>>>>>>>>> output for detecting failures in scripts and this warning is a false
> >>>>>>>>>> positive of a failure.
> >>>>>>>>>>
> >>>>>>>>>> Maybe quit function could wait little bit longer trying to allocate
> >>>>>>>>>> before it raises this warning?
> >>>>>>>>> If you see this warning, some call to system() or system2() or similar,
> >>>>>>>>> which executes an external program, failed to even run a shell to run
> >>>>>>>>> that external program, because there was not enough memory. You should
> >>>>>>>>> be able to find out where it happens by checking the exit status of
> >>>>>>>>> system().
> >>>>>>>>>
> >>>>>>>>> Tomas
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Jan Gorecki
> >>>>>>>>>>
> >>>>>>>>>> ______________________________________________
> >>>>>>>>>> R-devel using r-project.org mailing list
> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>>>>>> ______________________________________________
> >>>>>>>>> R-devel using r-project.org mailing list
> >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-devel using r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>>>>
> >>>>
> >>>> --
> >>>> Luke Tierney
> >>>> Ralph E. Wareham Professor of Mathematical Sciences
> >>>> University of Iowa                  Phone:             319-335-3386
> >>>> Department of Statistics and        Fax:               319-335-3017
> >>>>     Actuarial Science
> >>>> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> >>>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
> >>>
> >>
> >> --
> >> Luke Tierney
> >> Ralph E. Wareham Professor of Mathematical Sciences
> >> University of Iowa                  Phone:             319-335-3386
> >> Department of Statistics and        Fax:               319-335-3017
> >>     Actuarial Science
> >> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> >> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list