[Rd] Clarification for readChar man page

Jeffrey Horner jeff.horner at vanderbilt.edu
Thu Jun 14 17:52:20 CEST 2007


Duncan Murdoch wrote:
> On 6/14/2007 10:49 AM, Jeffrey Horner wrote:
>> Hi,
>>
>> Here's a patch to the readChar manual page (R-trunk as of today) that 
>> better clarifies readChar's return value. 
> 
> Your update is not right.  For example:
> 
> x <- as.raw(32:96)
> readChar(x, nchars=rep(2,100))
> 
> This returns a character vector of length 100, of which the first 32 
> elements have 2 chars, the next one has 1, and the rest are "".
> 
> So the length of nchars really does affect the length of the value.
> 
> Now, I haven't looked at the code, but it's possible we could delete the 
> "(which might be less than \code{length(nchars)})" remark, and if not, 
> it would be useful to explain the situations in which the return value 
> could be shorter than the nchars vector.

Well, this is rather a misunderstanding on my part; I completely forgot 
about vectorization. The manual page makes sense to me now.

But the situation about the return value possibly being less than 
length(nchars) isn't clear. Consider a 101 byte text file in a 
non-multibyte character locale:

f <- tempfile()
writeChar(paste(rep(seq(0,9),10),collapse=''),con=f)

and calling readChar() to read 100 bytes with length(nchar)=10:

 > readChar(f,nchar=rep(10,10))
  [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
  [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"

and readChar() reading the entire file with length(nchar)=11:

 > readChar(f,nchar=rep(10,11))
  [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
  [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
[11] "\0"

but the following two outputs are confusing. readchar() with 
length(nchar)>=12 returns a character vector length 12:

 > readChar(f,nchar=rep(10,12))
  [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
  [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
[11] "\0"         ""
 > readChar(f,nchar=rep(10,13))
  [1] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
  [6] "0123456789" "0123456789" "0123456789" "0123456789" "0123456789"
[11] "\0"         ""

It seems that the first time EOF is encountered on a read operation, an 
empty string is returned, but on subsequent reads nothing is returned. 
Is this intended behavior?

Jeff

> 
> Duncan Murdoch
> 
> 
> It could use some work as I'd
>> also like to add some text about using nchar() to find the length of 
>> the string that readchar() returns, but I'm unsure which of 
>> type="bytes" or type="chars" to mention. Is it type="chars"?
>>
>> Index: src/library/base/man/readChar.Rd
>> ===================================================================
>> --- src/library/base/man/readChar.Rd    (revision 41943)
>> +++ src/library/base/man/readChar.Rd    (working copy)
>> @@ -57,8 +57,8 @@
>>   }
>>
>>   \value{
>> -  For \code{readChar}, a character vector of length the number of
>> -  items read (which might be less than \code{length(nchars)}).
>> +  For \code{readChar}, a character vector of length 1 with the number
>> +  of characters less than or equal to nchars.
>>
>>     For \code{writeChar}, a raw vector (if \code{con} is a raw vector) or
>>     invisibly \code{NULL}.
>>
>>
>> Jeff
> 


-- 
http://biostat.mc.vanderbilt.edu/JeffreyHorner



More information about the R-devel mailing list