[Rd] gettext(msgid, domain="R") doesn't work for some 'msgid':s

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Fri Nov 5 17:55:24 CET 2021

>>>>> Tomas Kalibera 
>>>>>     on Fri, 5 Nov 2021 16:15:19 +0100 writes:

    > On 11/5/21 4:12 PM, Duncan Murdoch wrote:
    >> On 05/11/2021 10:51 a.m., Henrik Bengtsson wrote:
    >>> I'm trying to reuse some of the translations available in base R by 
    >>> using:
    >>>    gettext(msgid, domain="R")
    >>> This works great for most 'msgid's, e.g.
    >>> $ LANGUAGE=de Rscript -e 'gettext("cannot get working directory", 
    >>> domain="R")'
    >>> [1] "kann das Arbeitsverzeichnis nicht ermitteln"
    >>> However, it does not work for all.  For instance,
    >>> $ LANGUAGE=de Rscript -e 'gettext("Execution halted\n", domain="R")'
    >>> [1] "Execution halted\n"
    >>> This despite that 'msgid' existing in:
    >>> $ grep -C 2 -F 'Execution halted\n' src/library/base/po/de.po
    >>> #: src/main/main.c:342
    >>> msgid "Execution halted\n"
    >>> msgstr "Ausführung angehalten\n"
    >>> It could be that the trailing newline causes problems, because the
    >>> same happens also for:
    >>> $ LANGUAGE=de Rscript --vanilla -e 'gettext("error during cleanup\n",
    >>> domain="R")'
    >>> [1] "error during cleanup\n"
    >>> Is this meant to work, and if so, how do I get it to work, or is it a 
    >>> bug?
    >> I don't know the solution, but I think the cause is different than you 
    >> think, because I also have the problem with other strings not 
    >> including "\n":
    >> $ LANGUAGE=de Rscript -e 'gettext("malformed version string", 
    >> domain="R")'
    >> [1] "malformed version string"

You need domain="R-base" for the  "malformed version "string"

    > I can reproduce Henrik's report and the problem there is that the 
    > trailing \n is stripped by R before doing the lookup, in do_gettext

    >             /* strip leading and trailing white spaces and
    >                add back after translation */
    >             for(p = tmp;
    >                 *p && (*p == ' ' || *p == '\t' || *p == '\n');
    >                 p++, ihead++) ;

    > But, calling dgettext with the trailing \n does translate correctly for me.

    > I'd leave to translation experts how this should work (e.g. whether the 
    > .po files should have trailing newlines).

Thanks a lot, Tomas.
This is "interesting" .. and I think an R bug  one way or the
other (and I also note that Henrik's guess was also right on !).

We have the following:

- New translation *.po source files are to be made from the original *.pot  files.

  In our case it's our code that produce  R.pot and R-base.pot  
  (and more for the non-base packages, and more e.g. for
   Recommended packages 'Matrix' and 'cluster' I maintain).

And notably the R.pot (from all the "base" C error/warn/.. messages)
contains tons of msgid strings of the form  ".......\n"
i.e., ending in \n.
>From that automatically the translator's  *.po files should also
end in \n.

Additionally, the GNU gettext FAQ has
 (here :   https://www.gnu.org/software/gettext/FAQ.html#newline )

Q: What does this mean: “'msgid' and 'msgstr' entries do not both end with '\n'”

A: It means that when the original string ends in a newline, your translation must also end in a newline. And if the original string does not end in a newline, then your translation should likewise not have a newline at the end.
>From all that I'd conclude that we (R base code) are the source
of the problem.
Given the above FAQ, it seems common in other projects also to
have such trailing \n  and so we should really change the C code
you cite above.

On the other hand, this is from almost the very beginning of
when Brian added translation to R,
r32938 | ripley | 2005-01-30 20:24:04 +0100 (Sun, 30 Jan 2005) | 2 lines

include \n in whitespace ignored for R-level gettext

I think this has been because simultaneously we had started to
emphasize to useRs  they should *not* end message/format strings
in stop() / warning()  by a new line, but rather stop() and
warning() would *add* the newlines(s) themselves.

Still, currently we have a few such cases in  R-base.pot,
but just these few and maybe they really are "in error", in the
sense we could drop the ending '\n' (and do the same in all the *.po files!),
and newlines would be appended later {{not just by Rstudio which
   graceously adds final newlines in its R console, even for say
   cat("abc") }}

However, this is quite different for all the message strings from C, as
used there in  error() or warn() e.g., and so in   R.pot
we see many many msg strings ending in "\n" (which must then
also be in the *.po files.

My current conclusion is we should try simplifying the
do_gettext() code and *not* remove and re-add the '\n' (nor the
'\t' I think ...)


    > Tomas

    >> Duncan Murdoch

More information about the R-devel mailing list