[Rd] Problem with UTF-8 text in the Rcmdr package

John Fox jfox at mcmaster.ca
Sun Sep 7 17:15:26 CEST 2008


Dear Brian,

Thank you for addressing the problem -- I was hoping that you would.

> -----Original Message-----
> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> Sent: September-07-08 7:23 AM
> To: John Fox
> Cc: 'R-devel'; 'Jaro.Lajovic'
> Subject: Re: [Rd] Problem with UTF-8 text in the Rcmdr package
> 
> The issue appears to be the Rcmdr output window and menus.  They are done
> using Tcl/Tk, not by R.  So this might be a problem in Tcl/Tk or the fonts
> it uses, or it might be problem with what Rcmdr passes to the tcltk
> package.
> 
> We need the means to reproduce this (as per the posting guide):

Jaro provides an example in one of his messages in my posting (though it is
slightly in error): If one enters 

cat("ČŠŽčšž\n") 

in the Rcmdr Script window, the characters are rendered correctly. Executing
this command (via the Submit button) produces the following in the Output
window:

> cat("??????\n") 
??????

which actually appears as

> cat("??\n") 
??

This is under Windows Vista / R 2.7.2 / Rcmdr 1.4-0.

> 
> - what OSes are affected?  Does this occur in a UTF-8 locale on Linux, for
> example?

I've now checked under Mac OS X and Linux Ubuntu, with the following
results:

Under Mac OS X 10.5.4 / R 2.7.2 / Rcmdr 1.4-0 / Tcl/Tk 8.4 

cat("ČŠŽčšž\n") appears as cat("?????\n") in *both* the Script window and
the Output window.

Under Ubuntu Linux 8.04 / R 2.7.0 / Rcmdr 1.4-0/ Tcl/Tk 8.5

cat("ČŠŽčšž\n") appears *correctly* in *both* the Script window and the
Output window.

> 
> - in what locales?

I'm afraid that I don't know how to check this short of changing the locale
for my Windows machine. I do observe the problem in Windows when I start
Rgui with language=sl.

> 
> - what versions of Tcl/Tk?  Note that shipped with Windows R
> changed between 2.5.1 and 2.7.x.

Yes, and please see above, but if the problem were with Tcl/Tk, why does
this work in the Script window under Windows and in both Script and Output
under Ubuntu?

> 
> - Is this anything to do with translations?  I've not looked at how
> translations are done in Rcmdr, but if gettext() is used, the string
> passed to R for output is in the native encoding, so 'UTF-8 characters' is
> incorrect.  It is possible that it is an iconv problem if the translations
> are supplied in UTF-8 and not Latin-2.

Yes, the Rcmdr package uses gettext(). Could Jaro avoid the problem by using
Latin-2 in preference to UTF-8?

> 
> There are far too many layers involved here to guess at what is going on.
> My guess is that it ought to be possible to give a simple example of a
> string which can be output to the Rcmdr console and will be rendered
> incorrectly (together with a screen shot of how it is rendered).

Indeed, please see above. I've also attached a screenshot under Windows,
having started R with language=sl.

> 
> I think the characters referred to are the Unicode glyphs 's and z with
> caron', \u0161 and \u017E.  It seems that these will only be displayable
> in Rcmdr on Windows in a Latin-2 locale, which I do not have set up on
> Windows (but believe I could get installed).  However, examples using that
> (and the menus) seem to be correct in both sl_SI.iso88592 and sl_SI.utf8
> on Linux, which suggests that this is probably not an R issue but a Tcl/Tk
> one.

I'm above my depth with respect to these issues, but I do find it curious
that under Windows the characters appears correctly in the Script window but
not the Output window.

> 
> On Fri, 5 Sep 2008, John Fox wrote:
> 
> > Dear list members,
> >
> > I've attached some email correspondence with Jaro Lajovic (with his
> > permission), detailing a problem with the Slovenian translation file for
> > the Rcmdr package.
> 
> Unfortunately, it is not 'detailed', and we do need the details.

I hope that the additional information in this message will supply at least
some of the necessary details.

Thank you for your help,
 John

> 
> > In brief, while certain UTF-8 characters used in Slovenian used to
> > appear properly in older versions of R, some characters do not display
> > properly in the Rcmdr menus and output window under R 2.7.x. I've
> > confirmed the problem with the current version of the Rcmdr package
> > (1.4-0) and R 2.7.2 under Windows Vista.
> >
> > I've checked the R docs and NEWS file for changes to R, but wasn't able
> > to turn up anything that seemed relevant. Frankly, however, my
> > understanding of how various character sets are handled is only partial.
> >
> > Any help would be appreciated.
> >
> > John
> >
> > ------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario, Canada
> > web: socserv.mcmaster.ca/jfox
> >
> >
> > -----Original Message-----
> > From: Jaro.Lajovic [mailto:Jaro.Lajovic at mf.uni-lj.si]
> > Sent: August-26-08 2:57 AM
> > To: John Fox
> > Subject: Re: Slovenian Rcmdr .po and .mo - and a problem
> >
> > Dear John,
> >
> >> That seems to imply that there's a change in R rather than in the Rcmdr
> >> that produced this problem. Do you notice the problem with any other
> >> packages that use translation or with R itself?
> >
> > As for other translated R packages, I am afraid I am not aware of any.
> > However, a quick test using cat with special characters:
> > cat "ČŠŽčšž\n"
> > reveals that the string prints OK in the R (2.7.1.) console. The command
> > line also shows OK in the Rcmdr Script window, but does not display
> > right in the Output window. Special chars also fail in the Messages
window.
> >
> > Input (Script window) thus seems not to be affected, while the menu
> > system and output do not work properly.
> >
> > Thank you very much,
> > Jaro
> >
> >
> >> On Mon, 25 Aug 2008 21:54:43 +0200
> >>  "Jaro.Lajovic" <Jaro.Lajovic at mf.uni-lj.si> wrote:
> >>> Dear John,
> >>>
> >>>> One question though: I assume from your message that the previous
> >>>> version of the Rcmdr worked OK with R 2.7.1. Is that right?
> >>> No, the version 1.3-5 (that I still have with R 2.5.1) does not work
> >>> with R 2.7.1 either. So:
> >>>
> >>> Rcmdr 1.3-5 with R 2.5.1: works OK.
> >>> Rcmdr 1.3-5 with R 2.7.1: does not work properly.
> >>> Rcmdr 1.4-0 with R 2.7.1: does not work properly.
> >>>
> >>> Thank you in advance,
> >>> Jaro
> >>>
> >>>
> >>>
> >>>> On Mon, 25 Aug 2008 18:52:32 +0200
> >>>>  "Jaro.Lajovic" <Jaro.Lajovic at mf.uni-lj.si> wrote:
> >>>>> Dear John,
> >>>>>
> >>>>> Please find attached zipped Slovenian versions of .po (plain text
> >>> and
> >>>>> UTF-8 coded text) and .mo files.
> >>>>>
> >>>>> However, there seems to be a problem I have not been able to
> >>> resolve.
> >>>>> While special characters display properly under R version 2.5.1
> >>> with
> >>>>> Rcmdr 1.3-5, they fail to display (= are substituted by black
> >>> blocks)
> >>>>> under R version 2.7.1 with the new Rcmdr 1.4-0. By the way: the
> >>> .mo
> >>>>> file of the ver. 1.3-5 copied to 1.4-0 also failed to display
> >>>>> properly.
> >>>>>
> >>>>> (An additional detail: three special characters that are used in
> >>> the
> >>>>> Slo version are c, s and z with hacek. c with hacek is not
> >>> affected,
> >>>>> it is just s and z with hacek that are not displayed OK.)
> >>>>>
> >>>>> Your advice will be much appreciated.
> >>>>>
> >>>>> With best regards,
> >>>>> Jaro
> >>
> >> --------------------------------
> >> John Fox, Professor
> >> Department of Sociology
> >> McMaster University
> >> Hamilton, Ontario, Canada
> >> http://socserv.mcmaster.ca/jfox/
> >>
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
-------------- next part --------------
A non-text attachment was scrubbed...
Name: screenshot.pdf
Type: application/pdf
Size: 58260 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20080907/0f9d413e/attachment.pdf>


More information about the R-devel mailing list