[Rd] Unicode display problem with data frames under Windows

Peter Meissner retep.meissner at gmail.com
Mon May 25 21:12:05 CEST 2015


Am .05.2015, 18:43 Uhr, schrieb Duncan Murdoch <murdoch.duncan at gmail.com>:

> On 25/05/2015 11:37 AM, Ista Zahn wrote:
>> AFAIK this is the way it works on Windows. It has been discussed in  
>> several
>> places, e.g.
>> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
>> ,
>> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
>> (both of these came up when I googled the subject line of your email).
>
> Yes, but it is a bug, just a hard one to fix.  It needs someone to  
> dedicate a serious amount of time to deal with it.
>
> Since most of the people who tend to do that generally use systems in  
> UTF-8 locales where this isn't a problem, or don't use Windows, it is  
> languishing.
>
> Duncan Murdoch


I understand that these problems are not easy to fix but ...

I think that
"most of the people who tend to do that generally use systems in UTF-8  
locales"
is a biased perception. Developers might tend to use Mac or Linux most  
often. For others Windows still is and probably will be the OS most often  
used. For most of them switching to something else is a major hurdle.

What I often witness is that those non existent Windows users try to  
muddle through with numerous calls to Encoding() , iconv() and the like  
while at the same time never being sure if the strange behavior is due to  
their lack of understanding, Windows specifics or due to R. In the end  
they either succeed with their muddling or give up,  - but do not change  
the system.

So whoever might attempt the Hercules task will be praised by thousands ;-)

Best, Peter


>>
>> Best,
>> Ista
>> On May 25, 2015 9:39 AM, "Richard Cotton" <richierocks at gmail.com> wrote:
>>
>> > Here's a data frame with some Unicode symbols (set intersection and  
>> union).
>> >
>> > d <- data.frame(x = "A \u222a B \u2229 C")
>> >
>> > Printing this data frame under R 3.2.0 patched (r68378) and Windows  
>> 7, I
>> > see
>> >
>> > d
>> > ##                  x
>> > ## 1 A <U+222A> B n C
>> >
>> > Printing the column itself works fine.
>> >
>> > d$x
>> > ## [1] A ∪ B ∩ C
>> > ## Levels: A ∪ B ∩ C
>> >
>> > The encoding is correctly UTF-8.
>> >
>> > Encoding(as.character(d$x))
>> > ## [1] "UTF-8"
>> >
>> > Under Linux both forms of printing are fine for me.
>> >
>> > I'm not quite sure whether I've missed a setting or if this is a bug,  
>> so
>> >
>> > Am I doing something silly?
>> > Can anyone else reproduce this?
>> >
>> > --
>> > Regards,
>> > Richie
>> >
>> > Learning R
>> > 4dpiecharts.com
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list