[Rd] gettext(msgid, domain="R") doesn't work for some 'msgid':s

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Sat Nov 6 11:39:58 CET 2021

>>>>> Suharto Anggono Suharto Anggono via R-devel 
>>>>>     on Sat, 6 Nov 2021 08:07:58 +0000 (UTC) writes:

    > This issue has come up before: https://stat.ethz.ch/pipermail/r-help/2013-February/346721.html ("gettext wierdness"), https://stat.ethz.ch/pipermail/r-devel/2007-December/047893.html ("gettext() and messages in 'pkg' domain").
    > Using 'ngettext' is a workaround, like in https://rdrr.io/cran/svMisc/src/R/svMisc-internal.R .

Thank you for the pointers!

    > It is documented: "For 'gettext', leading and trailing whitespace is ignored when looking for the translation."

Indeed; and it *is* a feature  but really only valuable when the
msgid's (the original message strings) do *not* contain such
And, in fact, when xgettext() or xgettext2pot() from pkg 'tools'
are used to create the original *.pot files, they *also* trim
leading and trailing \n, \t and spaces.

So ideally there should not be any   end(or beginning)-of-line
"\n" in the R-base.pot (and hence corresponding  <LANG>-base.po )
and as I mentioned there *are* only a few, and
we could (should?) consider to remove them from there.

A "problem" is still in the many C-code msgid's  where
end-of-line-"\n" are common.

Yes, indeed, one can use the workaround Suharto mentions,
ngettext()  even though users will typically only look at
ngettext() if they want / need to learn about plural/singular
messages ...

I.e. in our case, this works, and Henrik could get what he wants

> Sys.setenv(LANGUAGE = "de")
> ngettext(1,"Execution halted\n", "", domain="R")
[1] "Ausführung angehalten\n"

but it's still not so satisfactory, that you cannot use
gettext() itself to look at a considerable proportion of the
C/C++/.. level error messages just because they end with "\n".

One possibility would be to introduce an optional
`trim = TRUE` argument, so the above could be achieved (more
efficiently and naturally) by

   gettext("Execution halted\n", domain="R", trim=FALSE)

but in any case, to *not* do the trimming anymore in general,
as I proposed yesterday (see below) is not a good idea.

    > ------------
    >>> Martin Maechler
>>>>> on Fri, 5 Nov 2021 17:55:24 +0100 writes:

>>>>> Tomas Kalibera
>>>>> on Fri, 5 Nov 2021 16:15:19 +0100 writes:

    >>> On 11/5/21 4:12 PM, Duncan Murdoch wrote:
    >>>> On 05/11/2021 10:51 a.m., Henrik Bengtsson wrote:
    >>>>> I'm trying to reuse some of the translations available in base R by
    >>>>> using:
    >>>>>    gettext(msgid, domain="R")
    >>>>> This works great for most 'msgid's, e.g.
    >>>>> $ LANGUAGE=de Rscript -e 'gettext("cannot get working directory",
    >>>>> domain="R")'
    >>>>> [1] "kann das Arbeitsverzeichnis nicht ermitteln"
    >>>>> However, it does not work for all.  For instance,
    >>>>> $ LANGUAGE=de Rscript -e 'gettext("Execution halted\n", domain="R")'
    >>>>> [1] "Execution halted\n"
    >>>>> This despite that 'msgid' existing in:
    >>>>> $ grep -C 2 -F 'Execution halted\n' src/library/base/po/de.po
    >>>>> #: src/main/main.c:342
    >>>>> msgid "Execution halted\n"
    >>>>> msgstr "Ausführung angehalten\n"
    >>>>> It could be that the trailing newline causes problems, because the
    >>>>> same happens also for:
    >>>>> $ LANGUAGE=de Rscript --vanilla -e 'gettext("error during cleanup\n",
    >>>>> domain="R")'
    >>>>> [1] "error during cleanup\n"
    >>>>> Is this meant to work, and if so, how do I get it to work, or is it a
    >>>>> bug?
    >>>> I don't know the solution, but I think the cause is different than you
    >>>> think, because I also have the problem with other strings not
    >>>> including "\n":
    >>>> $ LANGUAGE=de Rscript -e 'gettext("malformed version string",
    >>>> domain="R")'
    >>>> [1] "malformed version string"

    >> You need domain="R-base" for the "malformed version "string"

    >>> I can reproduce Henrik's report and the problem there is that the
    >>> trailing \n is stripped by R before doing the lookup, in do_gettext

    >>>             /* strip leading and trailing white spaces and
    >>>                add back after translation */
    >>>             for(p = tmp;
    >>>                 *p && (*p == ' ' || *p == '\t' || *p == '\n');
    >>>                 p++, ihead++) ;

    >>> But, calling dgettext with the trailing \n does translate correctly for me.

    >>> I'd leave to translation experts how this should work (e.g. whether the
    >>> .po files should have trailing newlines).

    >> Thanks a lot, Tomas.
    >> This is "interesting" .. and I think an R bug one way or the
    >> other (and I also note that Henrik's guess was also right on !).

    >> We have the following:

    >> - New translation *.po source files are to be made from the original *.pot files.

    >> In our case it's our code that produce R.pot and R-base.pot
    >> (and more for the non-base packages, and more e.g. for
    >> Recommended packages 'Matrix' and 'cluster' I maintain).

    >> And notably the R.pot (from all the "base" C error/warn/.. messages)
    >> contains tons of msgid strings of the form ".......\n"
    >> i.e., ending in \n.
    >>> From that automatically the translator's *.po files should also
    >> end in \n.

    >> Additionally, the GNU gettext FAQ has
    >> (here : https://www.gnu.org/software/gettext/FAQ.html#newline )

    >> ------------------------------------------------
    >> Q: What does this mean: “'msgid' and 'msgstr' entries do not both end with '\n'”

    >> A: It means that when the original string ends in a newline, your translation must also end in a newline. And if the original string does not end in a newline, then your translation should likewise not have a newline at the end.
    >> ------------------------------------------------

    >>> From all that I'd conclude that we (R base code) are the source
    >> of the problem.
    >> Given the above FAQ, it seems common in other projects also to
    >> have such trailing \n and so we should really change the C code
    >> you cite above.

    >> On the other hand, this is from almost the very beginning of
    >> when Brian added translation to R,
    >> ------------------------------------------------------------------------
    >> r32938 | ripley | 2005-01-30 20:24:04 +0100 (Sun, 30 Jan 2005) | 2 lines

    >> include \n in whitespace ignored for R-level gettext
    >> ------------------------------------------------------------------------

    >> I think this has been because simultaneously we had started to
    >> emphasize to useRs they should *not* end message/format strings
    >> in stop() / warning() by a new line, but rather stop() and
    >> warning() would *add* the newlines(s) themselves.

    >> Still, currently we have a few such cases in R-base.pot,
    >> but just these few and maybe they really are "in error", in the
    >> sense we could drop the ending '\n' (and do the same in all the *.po files!),
    >> and newlines would be appended later {{not just by Rstudio which
    >> graceously adds final newlines in its R console, even for say
    >> cat("abc") }}

    >> However, this is quite different for all the message strings from C, as
    >> used there in error() or warn() e.g., and so in R.pot
    >> we see many many msg strings ending in "\n" (which must then
    >> also be in the *.po files.

    >> My current conclusion is we should try simplifying the
    >> do_gettext() code and *not* remove and re-add the '\n' (nor the
    >> '\t' I think ...)

    > After such a change, I indeed do see

    > $ LANGUAGE=de bin/Rscript --vanilla -e 'gettext("Execution halted\n", domain="R")'
    > [1] "Ausführung angehalten\n"
    > $ LANGUAGE=de bin/Rscript --vanilla -e 'message("Execution halted\n", domain="R")'
    > Ausführung angehalten

    > $ LANGUAGE=de bin/Rscript --vanilla -e 'warning("Execution halted\n", domain="R")'
    > Warnmeldung:
    > Ausführung angehalten

    > $

    > (note the extra newline after the German translation!)
    > whereas before, not only using gettext() directly did not work,
    > but also using warning() or message() {with or without trailing \n}
    > were never translated.

    > ... and my simple #ifdef .. #endif change around the head/tail
    > save and restor seems to pass make check-devel ...

    > so I will be looking into dropping all those "head" and "tail" add
    > and remove parts in do_gettext() as they really seem to harm given the current
    > translation data bases which indeed *are* full of final '\n' in
    > `msgid` and corresponding translated `msgstr` ....

    > So, no need for a bugzilla PR nor a patch, please.
    > Maybe further examples which add something interesting in
    > addition to the ones we have here.

    > Thank you again, Henrik, Duncan, and Tomas!

    > Martin

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

More information about the R-devel mailing list