[Rd] Development version of R: Improved nchar(), nzchar() but changed API

Martin Maechler maechler at lynne.stat.math.ethz.ch
Mon Apr 27 17:08:51 CEST 2015


>>>>> Mark van der Loo <mark.vanderloo at gmail.com>
>>>>>     on Mon, 27 Apr 2015 10:26:32 +0200 writes:

    > Dear Martin, Does the work on nchar mean that bugs #16090
    > and #16091 will be resolved [1,2]?

    > Thanks, Mark

    > [1] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16090
    > [2] https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16091

Dear Mark,

no, the changes I've been talking about are not related to the
above.
I'm not savvy on multi-byte / UTF-8 encodings and so left these
in the capable hands of fellow R core members.

But thank you for bringing  the hijacked thread back on track ..

My proposed changes -- after amendments -- are said to break 19
CRAN packages (i.e., R CMD check of these) and about a dozen
Bioconductor packages  (the latter being somewhat less relevant as
 bioconductor packages will only have to work for the R 3.2.x
 series for almost half a year.)

It seems that most breakages are from things like

    if(nchar(someString) > 0)

which now gives an error if someString is NA (i.e. NA_character_)
but I'm currently arguing that this (error) is desirable,
because NA means <missing> or <anything> and hence a character
NA could well be the empty string.

Also it seems, that much of the breaking code could have become
more efficient and reliable (*) if the programmeRs had used
nzchar(), i.e., instead of the above, faster and more reliable
is
    if(nzchar(someString))

Note that nzchar() also gains the new 'keepNA' argument, but the
plan is to set the default there to  'keepNA = FALSE', i.e.,
100% back compatible.

--
(*) because nchar(x) already now can give NA when x contains
    invalid multibyte characters.

Martin



More information about the R-devel mailing list