[Rd] Unicode display problem with data frames under Windows

Peter Meissner retep.meissner at gmail.com
Tue May 26 09:29:17 CEST 2015


Am .05.2015, 09:01 Uhr, schrieb Richard Cotton <richierocks at gmail.com>:

> On 25 May 2015 at 19:43, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>>> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
>
>> Yes, but it is a bug, just a hard one to fix.  It needs someone to  
>> dedicate
>> a serious amount of time to deal with it.
>>
>> Since most of the people who tend to do that generally use systems in  
>> UTF-8
>> locales where this isn't a problem, or don't use Windows, it is  
>> languishing.
>
> Thanks for the link and the explanation of why the bug exists.
>
>>> On May 25, 2015 9:39 AM, "Richard Cotton" <richierocks at gmail.com>  
>>> wrote:
>>>
>>> > Here's a data frame with some Unicode symbols (set intersection and
>>> > union).
>>> >
>>> > d <- data.frame(x = "A \u222a B \u2229 C")
>>> >
>>> > Printing this data frame under R 3.2.0 patched (r68378) and Windows  
>>> 7, I
>>> > see
>>> >
>>> > d
>>> > ##                  x
>>> > ## 1 A <U+222A> B n C
>
> For future readers searching for a solution to this, you can get
> correct printing by setting the CTYPE part of the locale to
> Chinese/Japanese/Korean.
>
> Sys.setlocale("LC_CTYPE", "Chinese")
> ## [1] "Chinese (Simplified)_People's Republic of China.936"
>
> d
> ##            x
> ## 1 A ∪ B ∩ C
>


There is another workaround.

The problem with the character transformation on printing data frames  
stems from format() used within print.default(). Defining your own class  
and print function that does not use format() allows for correct printing  
in all locales.

Like this:


d <- data.frame(x = "A \u222a B \u2229 C")
d
##                  x
## 1 A <U+222A> B n C


class(d) <- c("unicode_df","data.frame")

# this is print.default from base R with only two lines modified, see #old#
print.unicode_df <- function (x, ..., digits = NULL, quote = FALSE, right  
= TRUE,
     row.names = TRUE)
{
     n <- length(row.names(x))
     if (length(x) == 0L) {
         cat(sprintf(ngettext(n, "data frame with 0 columns and %d row",
             "data frame with 0 columns and %d rows", domain = "R-base"),
             n), "\n", sep = "")
     }
     else if (n == 0L) {
         print.default(names(x), quote = FALSE)
         cat(gettext("<0 rows> (or 0-length row.names)\n"))
     }
     else {
         #old# m <- as.matrix(format.data.frame(x, digits = digits,
         #old#     na.encode = FALSE))
         m <- as.matrix(x)
         if (!isTRUE(row.names))
             dimnames(m)[[1L]] <- if (identical(row.names, FALSE))
                 rep.int("", n)
             else row.names
         print(m, ..., quote = quote, right = right)
     }
     invisible(x)
}


d
##              x
## [1,] A ∪ B ∩ C




-- 
Erstellt mit Operas E-Mail-Modul: http://www.opera.com/mail/



More information about the R-devel mailing list