[Rd] [External] R crashes when using huge data sets with character string variables

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Sun Dec 13 04:26:50 CET 2020


If R is receiving a kill signal there is nothing it can do about it.

I am guessing you are running into a memory over-commit issue in your OS.
https://en.wikipedia.org/wiki/Memory_overcommitment
https://engineering.pivotal.io/post/virtual_memory_settings_in_linux_-_the_problem_with_overcommit/

If you have to run this close to your physical memory limits you might
try using your shell's facility (ulimit for bash, limit for some
others) to limit process memory/virtual memory use to your available
physical memory. You can also try setting the R_MAX_VSIZE environment
variable mentioned in ?Memory; that only affects the R heap, not
malloc() done elsewhere.

Best,

luke

On Sat, 12 Dec 2020, Arne Henningsen wrote:

> When working with a huge data set with character string variables, I
> experienced that various commands let R crash. When I run R in a
> Linux/bash console, R terminates with the message "Killed". When I use
> RStudio, I get the message "R Session Aborted. R encountered a fatal
> error. The session was terminated. Start New Session". If an object in
> the R workspace needs too much memory, I would expect that R would not
> crash but issue an error message "Error: cannot allocate vector of
> size ...".  A minimal reproducible example (at least on my computer)
> is:
>
> nObs <- 1e9
>
> date <- paste( round( runif( nObs, 1981, 2015 ) ), round( runif( nObs,
> 1, 12 ) ), round( runif( nObs, 1, 31 ) ), sep = "-" )
>
> Is this a bug or a feature of R?
>
> Some information about my R version, OS, etc:
>
> R> sessionInfo()
> R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.1 LTS
>
> Matrix products: default
> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
>
> locale:
> [1] LC_CTYPE=en_DK.UTF-8       LC_NUMERIC=C
> [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_DK.UTF-8
> [5] LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_DK.UTF-8
> [7] LC_PAPER=en_DK.UTF-8       LC_NAME=C
> [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.0.3
>
> /Arne
>
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list