[R] Good practice for database with utf-8 string in package

Jeff Newmiller jdnewm|| @end|ng |rom dcn@d@v|@@c@@u@
Thu Sep 16 18:40:37 CEST 2021


Agree with Bert per your stated problem, but want to point out that you don't have control over the locale in which your users will be trying to display the encoded strings in your data. I am no expert in this, but you will need to become one in order to understand your own problem and any solutions you are given in r-package-devel. You will likely benefit from reading Kevin Ushey's writeup: https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/

On September 16, 2021 9:17:05 AM PDT, Bert Gunter <bgunter.4567 using gmail.com> wrote:
>This should not be posted here. Post on the R-package-devel list instead.
>
>Bert Gunter
>
>"The trouble with having an open mind is that people keep coming along
>and sticking things into it."
>-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>On Thu, Sep 16, 2021 at 9:13 AM Marc Girondot via R-help
><r-help using r-project.org> wrote:
>>
>> Hello everyone,
>>
>> I am a little bit stucked on the problem to include a database with
>> utf-8 string in a package. When I submit it to CRAN, it reports NOTES
>> for several Unix system and I try to find a solution (if it exists) to
>> not have these NOTES.
>>
>> The database has references and some names have non ASCII characters.
>>
>> * First I don't agree at all with the solution proposed here:
>>
>> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues
>>
>> "First, consider carefully if you really need non-ASCIItext."
>>
>> If a language has non ASCII characters, it is not just to make the
>> writting nicer of more complex, it is because it changes the prononciation.
>>
>> * Then I try to find solution to not have these NOTES.
>>
>> For example, here is a reference with utf-8 characters
>>
>> > DatabaseTSD$Reference[211]
>> [1] Hernández-Montoya, V., Páez, V.P. & Ceballos, C.P. (2017) Effects of
>> temperature on sex determination and embryonic development in the
>> red-footed tortoise, Chelonoidis carbonarius. Chelonian Conservation and
>> Biology 16, 164-171.
>>
>> When I convert the characters into unicode, I get indeed only ASCII
>> characters. Perfect.
>>
>> >  iconv(DatabaseTSD$Reference[211], "UTF-8", "ASCII", "Unicode")
>> [1] "Hern<U+00E1>ndez-Montoya, V., P<U+00E1>ez, V.P. & Ceballos, C.P.
>> (2017) Effects of temperature on sex determination and embryonic
>> development in the red-footed tortoise, Chelonoidis carbonarius.
>> Chelonian Conservation and Biology 16, 164-171."
>>
>> Then I have no NOTES when I checked the package with database in UNIX...
>> but how can I print the reference back with original characters ?
>>
>> Thanks a lot to point me to best practices to include databases with
>> non-ASCII characters and not have NOTES while submitted package to CRAN.
>>
>> Marc
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.



More information about the R-help mailing list