[Rd] read.table segfaults

Ben Bolker bbolker at gmail.com
Fri Aug 26 23:55:51 CEST 2011


Scott <ncbi2r <at> googlemail.com> writes:

> 
> It does look like you've got a memory issue. perhaps using 
>   as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments
> to read.table
> 
> if you don't specify these sorts of things, R can have to look through the
> file and figure out which columns are characters/factors etc and so the
> larger files cause more of a headache for R I'm guess. Hopefully someone
> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
> stringsAsFactors.
> 
>    do you have other objects loaded in memory as well? this file by itself
> might not be the problem - but it's a cumulative issue. 
>    have you checked the file structure in any other manner?
>    how large (Mb/kb) is the file that you're trying to read?
>    if you just read in parts of the file, is it okay?
>       read.table(filename,header=FALSE,sep="\t",nrows=100)
>       read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)

  There seem to be two issues here:

1. what can the original poster (OP) do to work around this problem?
(e.g. get the data into a relational data base and import it from 
there; use something from the High Performance task view such as
ff or data.table ...)

2. reporting a bug -- according to the R FAQ, any low-level
(segmentation-fault-type) crash of R when one is not messing
around with dynamically loaded code constitutes a bug. Unfortunately,
debugging problems like this is a huge pain in the butt.

  Goran, can you randomly or systematically generate an
object of this size, write it to disk, read it back in, and
generate the same error?  In other words, does something like

set.seed(1001)
d <- data.frame(label=rep(LETTERS[1:11],1e6),
                values=matrix(rep(1.0,11*17*1e6),ncol=17)
write.table(d,file="big.txt")
read.table("big.txt")

do the same thing?

Reducing it to this kind of reproducible example will make
it possible for others to debug it without needing to gain
access to your huge file ...



More information about the R-devel mailing list