[Rd] Unicode whitespace

hadley wickham h.wickham at gmail.com
Fri Jan 4 19:13:15 CET 2008


It would be nice if R ignored more unicode white space characters.
For example, if I have  "\u2028" in a command (which I get from a
line-break in keynote) I get the following error:

> qplot(carat, price, data = diamonds, 
  colour=clarity)
Error: unexpected input in "qplot(carat, price, data = diamonds, ?"

And occasionally have such problems when copying and pasting from
emails as well.

Wikipedia lists the following codepoints as whitespace (I'm sure there
is a more definitive reference but I could not find one with some
quick googling):

U0009-U000D (Control characters, containing TAB, CR and LF)
U0020 SPACE
U0085 NEL
U00A0 NBSP
U1680 OGHAM SPACE MARK
U180E MONGOLIAN VOWEL SEPARATOR
U2000-U200A (different sorts of spaces)
U2028 LSP
U2029 PSP
U202F NARROW NBSP
U205F MEDIUM MATHEMATICAL SPACE
U3000 IDEOGRAPHIC SPACE

would it be possible for R to treat these all in the same way? (Or
does it already but my R is misconfigured?)

Hadley

-- 
http://had.co.nz/


More information about the R-devel mailing list