[Rd] `basename` and `dirname` change the encoding to "UTF-8"

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Mon Jun 29 20:50:44 CEST 2020


On 29/06/2020 10:39 a.m., Johannes Rauh wrote:
> Dear R Developers,
> 
> I noticed that `basename` and `dirname` always return "UTF-8" on Windows (tested with R-4.0.0 and R-3.6.3):
> 
>> p <- "Föö/Bär"
>> Encoding(p)
> [1] "latin1"
>> Encoding(dirname(p))
> [1] "UTF-8"
>> Encoding(basename(p))
> [1] "UTF-8"
> 
> Is this on purpose?  At least I did not find any relevant comment in the documentation of `dirname`/`basename`.
> 
> Background: I'm currently struggeling with a directory name containing a latin1-character.  (I know that this is a bad idea, but I did not create the directory and I cannot rename it.)  I now want to pass a latin1-directory name to a function, which internally uses `tools::makeLazyLoadDB`.  At that point, internally, `dirname` is called, which changes the encoding, and things break.  If I use `debug` to halt the processing and "fix" the encoding, things work as expected.
> 
> So, if possible, I would prefer that `dirname` and `basename` preserve the encoding.

Actually, makeLazyLoadDB isn't exported from tools, so strictly speaking 
you shouldn't be calling it.  Or perhaps you have a good reason to call 
it, and should be asking for it to be exported, or you are calling a 
published function which calls it:  in either case it should probably be 
fixed to accept UTF-8.

But it doesn't call dirname or basename, so maybe the function that 
calls it is the one that needs fixing.

In any case, while asking dirname() and basename() to preserve the 
encoding sounds reasonable, it seems like it would just be covering up a 
deeper problem.

Duncan Murdoch



More information about the R-devel mailing list