[Rd] Correct usage of nchar(): precautionary change for R 2.6.0

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue May 29 11:39:11 CEST 2007


Remember that nchar() returns by default the number of *bytes* and not the 
number of characters.   I've recently spotted many cases in which nchar() 
has been used with substr() which works in characters; this can lead to 
incorrect results.  (This seems the commonest use of nchar() in 
packages.)

There were two reasons why nchar() was left defaulting to bytes when we 
allowed MBCSs in R:

1) Many of the uses are of the form if(nchar(x)) or if(nchar(x)==0) or 
even if nchar(x) != 0.  Computing the length of a string is an inefficient 
way to find out if it is non-empty, especially if it has to be converted 
to wchars to do so.

2) Once you allow multibyte characters, not all character strings are 
valid and for those nchar(x, "c") is NA.  Not much code has been written 
to take into account the possibility that nchar() might return an NA.

Despite these reasons, it seems that the dangers of incorrect use outweigh 
them.  So for 2.6.0

- There is a new function nzchar() which provided a quick test of non-zero 
number of characters.

- The default becomes nchar(type="chars").


It seems that nchar() is used quite often to lay out 'printed' or
graphical output.  For that, normally nchar(type="width") is what is 
needed.

None of this is an issue in single-byte locales or for ASCII text in 
UTF-8 or the Windows' CJK locales, but please bear in mind that you cannot 
assume such for a public package.  (The assumption that ASCII code is 
represented in single bytes is pretty widespread, but at some point we may 
want to support Windows' native UCS-2 encoding for which it is not true.)

The best advice is to use the 'type' argument for all uses of nchar() in 
public code unless perhaps you are sure only ASCII data will ever be 
encountered.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list