[Rd] segfault with readDCF on R 3.1.2 on AIX 6.1 when using install.packages

Hervé Pagès hpages at fredhutch.org
Tue Sep 22 00:22:36 CEST 2015


On 09/21/2015 02:48 PM, Duncan Murdoch wrote:
> On 21/09/2015 4:50 PM, Hervé Pagès wrote:
>> Hi,at
>>
>> Note that one significant change to read.dcf() that happened since R
>> 3.0.2 is the addition of support for arbitrary long lines (commit
>> 63281), which never worked:
>>
>>     dcf <- paste(c("aa: ", rep(letters, length.out=10000)), collapse="")
>>     writeLines(dcf, "test.dcf")
>>     nchar(read.dcf("test.dcf"))
>>     #        aa
>>     # [1,] 8186
>>
>
> I don't see that in R 3.2.2 on OSX or 3.2.2 patched on Windows:
>
>>     nchar(read.dcf("test.dcf"))
>          aa
> [1,] 10000

You're just being lucky that 'buf2[nbuf + 1]' is not '\0' after
allocation by R_alloc(). Try this:

   val <- paste(rep(letters, length.out=10000), collapse="")
   writeLines(paste("aa:", val), "test.dcf")
   identical(val, read.dcf("test.dcf")[1])

Maybe you won't be so lucky with this one.

H.

>
> Duncan Murdoch
>
>> The culprit being line 53 in src/main/dcf.c where the author of the
>> Rconn_getline2() function only copies 'nbuf' chars from 'buf' to 'buf2'
>> when in fact 'nbuf + 1' chars have been stored in 'buf' so far.
>>
>> Quickest fix:
>>
>> Index: src/main/dcf.c
>> ===================================================================
>> --- src/main/dcf.c	(revision 69404)
>> +++ src/main/dcf.c	(working copy)
>> @@ -50,7 +50,7 @@
>>    	if(nbuf+2 >= bufsize) { // allow for terminator below
>>    	    bufsize *= 2;
>>    	    char *buf2 = R_alloc(bufsize, sizeof(char));
>> -	    memcpy(buf2, buf, nbuf);
>> +	    memcpy(buf2, buf, nbuf + 1);
>>    	    buf = buf2;
>>    	}
>>    	if(c != '\n'){
>>
>> However a better fix would be to have 'nbuf' actually contain the nb
>> of chars that was stored in 'buf' so far (as it name suggests):
>>
>> Index: src/main/dcf.c
>> ===================================================================
>> --- src/main/dcf.c	(revision 69404)
>> +++ src/main/dcf.c	(working copy)
>> @@ -42,12 +42,12 @@
>>    /* Use R_alloc as this might get interrupted */
>>    static char *Rconn_getline2(Rconnection con)
>>    {
>> -    int c, bufsize = MAXELTSIZE, nbuf = -1;
>> +    int c, bufsize = MAXELTSIZE, nbuf = 0;
>>        char *buf;
>>
>>        buf = R_alloc(bufsize, sizeof(char));
>>        while((c = Rconn_fgetc(con)) != R_EOF) {
>> -	if(nbuf+2 >= bufsize) { // allow for terminator below
>> +	if(nbuf+1 >= bufsize) { // allow for terminator below
>>    	    bufsize *= 2;
>>    	    char *buf2 = R_alloc(bufsize, sizeof(char));
>>    	    memcpy(buf2, buf, nbuf);
>> @@ -54,17 +54,19 @@
>>    	    buf = buf2;
>>    	}
>>    	if(c != '\n'){
>> -	    buf[++nbuf] = (char) c;
>> +	    buf[nbuf++] = (char) c;
>>    	} else {
>> -	    buf[++nbuf] = '\0';
>> +	    buf[nbuf++] = '\0';
>>    	    break;
>>    	}
>>        }
>> +    if (nbuf == 0)
>> +        return NULL;
>>        /* Make sure it is null-terminated even if file did not end with
>>         *  newline.
>>         */
>> -    if(nbuf >= 0 && buf[nbuf]) buf[++nbuf] = '\0';
>> -    return (nbuf == -1) ? NULL: buf;
>> +    buf[nbuf-1] = '\0';
>> +    return buf;
>>    }
>>
>> That improves readability and reduces the risk of bugs.
>>
>> Also note that Rconn_getline2() allocates a new buffer for each line in
>> the DCF file. So we got support for arbitrary long lines (a rare
>> situation) at the price of a slow down and increased memory usage for
>> all DCF files. Sounds less than optimal :-/
>>
>> Cheers,
>> H.
>>
>>
>> On 09/21/2015 11:01 AM, Duncan Murdoch wrote:
>>> On 21/09/2015 1:49 PM, Vinh Nguyen wrote:
>>>> Here's an update:
>>>>
>>>> I checked the ChangeLog for R, and it seems like readDCF was changed
>>>> in 3.0.2.  I went on a whim and copied src/main/dcf.c from R 2.15.3
>>>> over to 3.2.2, and R compiled fine and install.packages now work for
>>>> me.
>>>>
>>>> This is probably not ideal, but it at least makes R usable on AIX for
>>>> me.  Would definitely like to help figure out what's wrong with the
>>>> new dcf.c on AIX.
>>>
>>> I don't know if anyone on the core team has access to AIX, so you're
>>> likely on your own for this.
>>>
>>> I'd suggest running R in a debugger (gdb or whatever you have), and
>>> identifying exactly which line in dcf.c fails, and why.  If you tell us
>>> that, we might be able to spot what is going wrong.
>>>
>>> Duncan Murdoch
>>>
>>>>
>>>> Thanks.
>>>>
>>>> -- Vinh
>>>>
>>>>
>>>> On Mon, Sep 21, 2015 at 10:01 AM, Vinh Nguyen <vinhdizzo at gmail.com> wrote:
>>>>> Hi there,
>>>>>
>>>>> I just wanted to follow up on this readDCF issue with install.packages
>>>>> on AIX on R 3.*.  I'm happy to help try potential solutions or debug
>>>>> if anyone could point me in the right direction.
>>>>>
>>>>> To re-cap, it appears readDCF is segfault'ing since R 3.* on AIX.
>>>>> This was not the case up until R 2.15.3.  This makes install.packages
>>>>> not usable.  Thanks.
>>>>>
>>>>> -- Vinh
>>>>>
>>>>>
>>>>> On Tue, Nov 11, 2014 at 10:23 AM, Vinh Nguyen <vinhdizzo at gmail.com> wrote:
>>>>>> Dear list (re-posting from r-help as r-devel is probably more appropriate),
>>>>>>
>>>>>> I was able to successfully compile R on our AIX box at work using the
>>>>>> GNU compilers following the instructions on the R Administration
>>>>>> guide.  The output can be seen at here
>>>>>> (https://gist.github.com/nguyenvinh/504321ea9c89d8919bef) and yields
>>>>>> no errors .
>>>>>>
>>>>>> However, I get a segfault whenever I try to use the install.packages
>>>>>> function to install packages.  Using debug, I was able to trace it to
>>>>>> the readDCF function:
>>>>>>
>>>>>> Browse[2]>
>>>>>> debug: if (!all) return(.Internal(readDCF(file, fields, keep.white)))
>>>>>> Browse[2]>
>>>>>> debug: return(.Internal(readDCF(file, fields, keep.white)))
>>>>>> Browse[2]>
>>>>>>
>>>>>>    *** caught segfault ***
>>>>>> address 4, cause 'invalid permissions'
>>>>>>
>>>>>> Possible actions:
>>>>>> 1: abort (with core dump, if enabled)
>>>>>> 2: normal R exit
>>>>>> 3: exit R without saving workspace
>>>>>> 4: exit R saving workspace
>>>>>> Selection:
>>>>>>
>>>>>> Was curious if anyone has a clue on why such error exists or what I
>>>>>> could do to fix it?  I'm able to install packages via R CMD INSTALL,
>>>>>> but I would hate to have to manually determine dependencies, download
>>>>>> the source for each package, and install them "by hand" via R CMD
>>>>>> INSTALL.
>>>>>>
>>>>>> I went back and compiled older versions of R to see if this error
>>>>>> exists.  On R 3.0.3, I get:
>>>>>>
>>>>>> debug(available.packages)
>>>>>> install.packages('ggplot2', dep=TRUE, repo='http://cran.stat.ucla.edu')
>>>>>> ...
>>>>>> Browse[2]>
>>>>>> debug: z <- res0 <- tryCatch(read.dcf(file = tmpf), error = identity)
>>>>>> Browse[2]>
>>>>>> Error: segfault from C stack overflow
>>>>>>
>>>>>> On R 2.15.3, I do not see the error.
>>>>>>
>>>>>> Would be great to get this resolved.  Thank you for your help.
>>>>>>
>>>>>> -- Vinh
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list