[R] umlauts in Rd files

Wed Jun 15 15:30:51 CEST 2005

On Wed, 15 Jun 2005, Peter Dalgaard wrote:

> Robin Hankin <r.hankin at noc.soton.ac.uk> writes:
>
>> Hi
>>
>> I'm having difficulty following the advice in section 2.7 of R-exts.
>>
>> In one of my packages, there is a function called mobius().
>>
>> I want to refer to it in the Rd file as the Möbius function, and to
>> illustrate the
>>   Möbius  inversion formula (just to be explicit: this is "Mobius" but
>> with two dots over the second letter).
>>
>> R-exts section 2.7  gives
>>
>> \enc{Jöreskog}{Joreskog}
>>
>> as an example, but when I cut-and-paste this, the dvi file (as produced
>> by R CMD Rd2dvi)
>> shows the umlauted "o" as A and Z with some diacritical marks, not the
>> desired o with
>> two dots on.
>>
>> Using \"{o} is fine for the dvi output but not the ascii output.
>>
>> How do I put an umlauted "o" in an Rd file in such a way as to have a
>> nice
>> ascii help page and nice dvi files?
>
> Well... You can't. There's no odiaeresis in ASCII. That's exactly the
> problem. In UTF-8 or ISO-Latin-1/9 (aka 8859-1 or ditto with the
> addition of the Euro) you can display the character and we did
> previously implicitly assume Latin-1. However this is of no use to
> people in say Latin-2 locales, and in fact we can no longer spell the
> entire R Core Team correctly using any of the Latin-N locales (we
> lose either M{\"a}chler or {\v S}imon).
>
> As far as I understand the current situation, we recommend that text
> files be pure ASCII (which has also led us to introduce deliberate
> misspellings of various people in the NEWS file and similar places).
>
> What is happening to you is something else though: The double
> characters are a tell-tale sign that you have provided UTF-8 to
> something that expected an 8-bit encoding like Latin-1. The fix for
> that should be to put \encoding{UTF-8} somewhere at the beginning of
> the .Rd file.
>
> (I may well have gotten some detail wrong here, Brian probably knows
> the best.)

UTF-8 for latex does not work well (as yet, at least: there is now a utf8 
encoding that allows at least the first plane (Latin-1) to work).  So it 
would be much better to use Latin-1 for the file and mark it with 
\encoding{latin1} and mark specifically with \enc{Möbius}{Mobius} or your 
preferred transliteration.

The problem is not really for Latin-2 (which does have a and o diaeresis), 
but languages such as Japanese and Chinese which only have ASCII.  So the 
transliteration is for people without any accents in their charset.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595