[R] Good practice for database with utf-8 string in package

Thu Sep 16 18:05:43 CEST 2021

Hello everyone,

I am a little bit stucked on the problem to include a database with 
utf-8 string in a package. When I submit it to CRAN, it reports NOTES 
for several Unix system and I try to find a solution (if it exists) to 
not have these NOTES.

The database has references and some names have non ASCII characters.

* First I don't agree at all with the solution proposed here:

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues

"First, consider carefully if you really need non-ASCIItext."

If a language has non ASCII characters, it is not just to make the 
writting nicer of more complex, it is because it changes the prononciation.

* Then I try to find solution to not have these NOTES.

For example, here is a reference with utf-8 characters

> DatabaseTSD$Reference[211]
[1] Hernández-Montoya, V., Páez, V.P. & Ceballos, C.P. (2017) Effects of 
temperature on sex determination and embryonic development in the 
red-footed tortoise, Chelonoidis carbonarius. Chelonian Conservation and 
Biology 16, 164-171.

When I convert the characters into unicode, I get indeed only ASCII 
characters. Perfect.

>  iconv(DatabaseTSD$Reference[211], "UTF-8", "ASCII", "Unicode")
[1] "Hern<U+00E1>ndez-Montoya, V., P<U+00E1>ez, V.P. & Ceballos, C.P. 
(2017) Effects of temperature on sex determination and embryonic 
development in the red-footed tortoise, Chelonoidis carbonarius. 
Chelonian Conservation and Biology 16, 164-171."

Then I have no NOTES when I checked the package with database in UNIX... 
but how can I print the reference back with original characters ?

Thanks a lot to point me to best practices to include databases with 
non-ASCII characters and not have NOTES while submitted package to CRAN.

Marc

	[[alternative HTML version deleted]]