[Rd] read.table segfaults

Göran Broström goran.brostrom at gmail.com
Sat Aug 27 11:27:47 CEST 2011


On Fri, Aug 26, 2011 at 11:55 PM, Ben Bolker <bbolker at gmail.com> wrote:
> Scott <ncbi2r <at> googlemail.com> writes:
>
>>
>> It does look like you've got a memory issue. perhaps using
>>   as.is=TRUE, and/or stringsAsFactors=FALSE will help as optional arguments
>> to read.table
>>
>> if you don't specify these sorts of things, R can have to look through the
>> file and figure out which columns are characters/factors etc and so the
>> larger files cause more of a headache for R I'm guess. Hopefully someone
>> else can comment further on this? I'd true toggling TRUE/FALSE for as.is and
>> stringsAsFactors.
>>
>>    do you have other objects loaded in memory as well? this file by itself
>> might not be the problem - but it's a cumulative issue.
>>    have you checked the file structure in any other manner?
>>    how large (Mb/kb) is the file that you're trying to read?
>>    if you just read in parts of the file, is it okay?
>>       read.table(filename,header=FALSE,sep="\t",nrows=100)
>>       read.table(filename,header=FALSE,sep="\t",skip=20000,nrows=100)
>
>  There seem to be two issues here:
>
> 1. what can the original poster (OP) do to work around this problem?
> (e.g. get the data into a relational data base and import it from
> there; use something from the High Performance task view such as
> ff or data.table ...)

Interestingly, the text file was created by a selection from an SQL
data base. I have access to 'db2' on an ubuntu machine, I run, at the
bash prompt,

$ db2 < file2.sql

where file2.sql contains

connect to linnedb user goran using xxxxxxxxxxx
export to '/home/goran/ALC/SQL/fil2_s.txt' of del modified by coldelX09
 select  linneid, fodelsear, kon, ....... from u09021.fil2
connect reset

How do I get a direct connection between  R  and the data base 'linnedb'?

> 2. reporting a bug -- according to the R FAQ, any low-level
> (segmentation-fault-type) crash of R when one is not messing
> around with dynamically loaded code constitutes a bug. Unfortunately,
> debugging problems like this is a huge pain in the butt.
>
>  Goran, can you randomly or systematically generate an
> object of this size, write it to disk, read it back in, and
> generate the same error?  In other words, does something like
>
> set.seed(1001)
> d <- data.frame(label=rep(LETTERS[1:11],1e6),
>                values=matrix(rep(1.0,11*17*1e6),ncol=17)
> write.table(d,file="big.txt")
> read.table("big.txt")
>
> do the same thing?

No but I get new errors:

> ss <- read.table("big.txt")
Error in read.table("big.txt") : duplicate 'row.names' are not allowed

(there are no duplicates)

I tried to add an item to the first line and

> ss <- read.table("big.txt", header = TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 10610008 did not have 19 elements

which is wrong; that line has 19 elements.

Göran

> Reducing it to this kind of reproducible example will make
> it possible for others to debug it without needing to gain
> access to your huge file ...
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Göran Broström



More information about the R-devel mailing list