[Rd] readchar() bug or feature? was Re: Clarification for readChar man page

Duncan Murdoch murdoch at stats.uwo.ca
Sat Jun 16 15:57:07 CEST 2007


On 14/06/2007 5:05 PM, Jeffrey Horner wrote:
> Jeffrey Horner wrote:
>> Duncan Murdoch wrote:
>>> On 6/14/2007 10:49 AM, Jeffrey Horner wrote:
>>>> Hi,
>>>>
>>>> Here's a patch to the readChar manual page (R-trunk as of today) that 
>>>> better clarifies readChar's return value. 
>>> Your update is not right.  For example:
>>>
>>> x <- as.raw(32:96)
>>> readChar(x, nchars=rep(2,100))
>>>
>>> This returns a character vector of length 100, of which the first 32 
>>> elements have 2 chars, the next one has 1, and the rest are "".
>>>
>>> So the length of nchars really does affect the length of the value.
>>>
>>> Now, I haven't looked at the code, but it's possible we could delete the 
>>> "(which might be less than \code{length(nchars)})" remark, and if not, 
>>> it would be useful to explain the situations in which the return value 
>>> could be shorter than the nchars vector.
>> Well, this is rather a misunderstanding on my part; I completely forgot 
>> about vectorization. The manual page makes sense to me now.
>>
>> But the situation about the return value possibly being less than 
>> length(nchars) isn't clear. Consider a 101 byte text file in a 
>> non-multibyte character locale:
>>
>> f <- tempfile()
>> writeChar(paste(rep(seq(0,9),10),collapse=''),con=f)
>>
>> and calling readChar() to read 100 bytes with length(nchar)=10:
>>
>>  > readChar(f,nchar=rep(10,10))
>>   [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>>   [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>>
>> and readChar() reading the entire file with length(nchar)=11:
>>
>>  > readChar(f,nchar=rep(10,11))
>>   [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>>   [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [11] "\0"
>>
>> but the following two outputs are confusing. readchar() with 
>> length(nchar)>=12 returns a character vector length 12:
>>
>>  > readChar(f,nchar=rep(10,12))
>>   [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>>   [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [11] "\0"         ""
>>  > readChar(f,nchar=rep(10,13))
>>   [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>>   [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
>> [11] "\0"         ""
>>
>> It seems that the first time EOF is encountered on a read operation, an 
>> empty string is returned, but on subsequent reads nothing is returned. 
>> Is this intended behavior?
> 
> I believe this is an off-by-1 bug in do_readchar(). The following fix to 
> R-trunk v41946 causes the above readchar() calls to cap the returned 
> vector length at 11:
> 
> Index: src/main/connections.c
> ===================================================================
> --- src/main/connections.c      (revision 41946)
> +++ src/main/connections.c      (working copy)
> @@ -3286,7 +3286,7 @@
>              if(!con->open(con)) error(_("cannot open the connection"));
>       }
>       PROTECT(ans = allocVector(STRSXP, n));
> -    for(i = 0, m = i+1; i < n; i++) {
> +    for(i = 0, m = 0; i < n; i++) {
>          len = INTEGER(nchars)[i];
>          if(len == NA_INTEGER || len < 0)
>              error(_("invalid value for '%s'"), "nchar");
> 
> 
> Jeff
> 

Thanks for working this out.  I think your patch is right, but there's 
another bug:  raw vectors are handled differently than connections. 
I'll fix both problems, but I'll do it in R-devel, not R-patched:  I 
think we were consistent with the documentation before, so the bug is 
not extremely serious, and it's getting too close to the release of 
2.5.1 to make this change there (since someone may have depended on the 
previous behaviour).

Duncan Murdoch



More information about the R-devel mailing list