[Rd] strange apparently data-dependent crash with large data (PR#6955)

tplate at blackmesacapital.com tplate at blackmesacapital.com
Mon Jun 7 18:59:27 CEST 2004


I'm consistently seeing R crash with a particular large data set.  What's 
strange is that although the crash seems related to running out of memory, 
I'm unable to construct a pseudo-random data set of the same size that also 
causes the crash.  Further adding to the strangeness is that the crash only 
happens if the dataset goes through a save()/load() cycle -- without that, 
the command in question just gives an out-of-memory error, but does not crash.

To make this clear, three different versions of the same data consistently 
produce very different behavior:

(1) original data read with read.table: memory error; fail to allocate 
164062 Kb
(2) original data through save()/load() cycle: memory error; fail to 
allocate 82031 Kb, followed by crash
(3) psuedo-random data of same size and similar characteristics: works 
without problem

This is with R-1.9.0 under Windows 2000.  I'm not loading any optional 
packages.  I get the same crash behavior with R-1.9.0 patched, and R-2.0.0 
alpha, but I didn't test success with the psuedo-random data under those 
programs.  (In case it matters, I got R-1.9.0 patched and R-2.0.0 alpha as 
pre-compiled Windows binaries from http://cran.us.r-project.org/ at 9:30am 
MDT on Jun 7, 2004.)  Unfortunately, I don't have sufficient knowledge of 
how to debug memory problems in R to make further progress than I've made 
here, but maybe the following will provide some clues for someone else.

All the following transcripts are from Rgui.exe, with new runs at each 
comment beginning with "###"

### Read in the data and get a out-of-memory error (but no crash)
 > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip
 > X <- read.table("ClassifyTrain.txt", skip=2)
 > X1 <- as.matrix(X)
 > hist(log(X1[,-(1:2)]+1))
Error: cannot allocate vector of size 164062 Kb
In addition: Warning message:
Reached total allocation of 1024Mb: see help(memory.size)
 >

### Read in the data and save it as a .RData file for faster runs (I 
initially did this for speed,
### but this seems to be essential to causing the crash)
 > # ClassifyTrain.txt is from http://mill.ucsd.edu/data/ClassifyTrain.zip
 > X <- read.table("ClassifyTrain.txt", skip=2)
 > X1 <- as.matrix(X)
 > c(class(X1), storage.mode(X1), dim(X1))
[1] "matrix" "double" "30000"  "702"
 > save(list="X1", file="X1.RData")

### Produce the crash
 > version
          _
platform i386-pc-mingw32
arch     i386
os       mingw32
system   i386, mingw32
status
major    1
minor    9.0
year     2004
month    04
day      12
language R
 >
 > load("X1.RData")
 > c(class(X1), storage.mode(X1), dim(X1))
[1] "matrix" "double" "30000"  "702"
 > # all of the following 3 command consistently cause a crash
 > hist(log(X1[,-(1:2)]+1))
 > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5))
 > hist(log(X1[,-(1:2)]+1), breaks=seq(0,13,0.5), plot=F)
Error: cannot allocate vector of size 82031 Kb
In addition: Warning message:
Reached total allocation of 1024Mb: see help(memory.size)

[message that comes in a Windows dialog box after a wait of many seconds:]

R Console: Rgui.exe - Application Error
The exception unknown software exception (0xc00000fd) occured in the 
application at location 0x6b5b0a53

#### The following is a failed attempt to reproduce the crash with 
psuedo-random
#### data, i.e., R functions correctly (even when X1 is in memory)
 >
 > # Look at some characteristics of the original data in
 > # order to produce a matrix of similar psuedo-random numbers.
 > load("X1.RData")
 > dim(X1)
[1] 30000   702
 > class(X1)
[1] "matrix"
 > storage.mode(X1)
[1] "double"
 > table(is.na(X1))

    FALSE
21060000
 > table(X1==0)

    FALSE     TRUE
  2284455 18775545
 > exp(diff(log(table(X1==0))))
     TRUE
8.218829
 > table(X1>=0)

     TRUE
21060000
 > range(X1)
[1]      0 326022
 > memory.limit()
[1] 1073741824
 > memory.limit()/2^20
[1] 1024
 > object.size(X1)/2^20
[1] 161.0267
 >
 > set.seed(1)
 > X <- matrix(rexp(30000 * 702, 5e-5) * rbinom(30000 * 702, 1, 1/8), ncol=702)
 > range(X)
[1] 3.615044e-04 3.249415e+05
 >
 > # Both of thse commands seem to work without problems
 > hist(log(X[,-(1:2)]+1))
 > hist(log(X[,-(1:2)]+1), breaks=seq(0,13,0.5))



More information about the R-devel mailing list