[Rd] [External] R crashes when using huge data sets with character string variables

Dirk Eddelbuettel edd @end|ng |rom deb|@n@org
Sun Dec 13 05:17:05 CET 2020


On 12 December 2020 at 21:26, luke-tierney using uiowa.edu wrote:
| If R is receiving a kill signal there is nothing it can do about it.
| 
| I am guessing you are running into a memory over-commit issue in your OS.
| https://en.wikipedia.org/wiki/Memory_overcommitment
| https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/
| 
| If you have to run this close to your physical memory limits you might
| try using your shell's facility (ulimit for bash, limit for some
| others) to limit process memory/virtual memory use to your available
| physical memory. You can also try setting the R_MAX_VSIZE environment
| variable mentioned in ?Memory; that only affects the R heap, not
| malloc() done elsewhere.

Similarly, as it is Linux, you could (easily) add virtual memory via a
swapfile (see 'man 8 swapfile' and 'man 8 swapon').  But even then, I expect
this to be slow -- 1e9 is a lot.

I have 32gb and ample swap (which is rarely used, but a safety net). When I
use your code with nObs <- 1e8 it ends up with about 6gb which poses poses no
problem, but already takes 3 1/2 minutes:

> nObs <- 1e8
> system.time(date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs, 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" ))
   user  system elapsed 
203.723   1.779 205.528 
> 

You may want to play with the nObs value to see exactly where it breaks on
your box.

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org



More information about the R-devel mailing list