[Rd] Output mis-encoded on Windows w/ RGui 3.5.1 in strange case

Tomas Kalibera tom@@@k@liber@ @ending from gm@il@com
Wed Jul 18 15:38:39 CEST 2018


Fixed in R-devel and R-patched,
Tomas

On 07/18/2018 12:03 PM, Tomas Kalibera wrote:
> Thanks, I can now reproduce and it is a bug that is easy to fix, I 
> will do so shortly.
>
> Fyi it can be reproduced simply by running these two lines in Rgui:
>
> list()
> encodeString("apple")
>
> Best
> Tomas
>
> On 07/17/2018 05:16 PM, Kevin Ushey wrote:
>> Sorry, I should have been more clear -- if I write the contents of
>> that script to a file called 'encoding.R' and source that, then I see
>> the reported behavior.
>>
>> Here's something standalone that you should hopefully be able to copy
>> + paste into RGui to reproduce:
>>
>> code <- '
>>     x <- 1
>>     print(list())
>>     save(x, file = tempfile())
>>     output <- encodeString("apple")
>>     print(output)
>> '
>>
>> file <- tempfile(fileext = ".R")
>> writeLines(code, con = file)
>> source(file)
>>
>> When I run this, I see:
>>
>>> code <- '
>> +    x <- 1
>> +    print(list())
>> +    save(x, file = tempfile())
>> +    output <- encodeString("apple")
>> +    print(output)
>> + '
>>> file <- tempfile(fileext = ".R")
>>> writeLines(code, con = file)
>>> source(file)
>> list()
>> [1] "\002ÿþapple\003ÿþ"
>>
>> This is with today's R-devel:
>>
>>> sessionInfo()
>> R Under development (unstable) (2018-07-16 r74967)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows 10 x64 (build 17134)
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252
>> [2] LC_CTYPE=English_United States.1252
>> [3] LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C
>> [5] LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> loaded via a namespace (and not attached):
>> [1] compiler_3.6.0
>>
>> I realize the example looks incomplete, but it seems like each step is
>> required to reproduce the strange behavior:
>>
>>     1) You need to print an empty list,
>>     2) You need to invoke save() after printing that empty list,
>>     3) Then, attempts to call encodeString() will produce the strange output.
>>
>> For what it's worth, it may be related to a behavior I'm seeing where
>> the first name printed for an R list is quoted with backticks even
>> when not necessary:
>>
>>> list(x = 1, y = 2)
>> $`x`
>> [1] 1
>>
>> $y
>> [1] 2
>>
>> Thanks,
>> Kevin
>>
>> On Tue, Jul 17, 2018 at 6:12 AM Tomas Kalibera<tomas.kalibera using gmail.com>  wrote:
>>> Hi Kevin,
>>>
>>> the extra bytes you are seeing are escapes for UTF-8 strings used in
>>> input to RGui console. Recently ascii strings are converted to UTF-8 so
>>> you would get these escapes for ascii strings now as well. RGui
>>> understands these escapes and converts from UTF-8 to wide characters
>>> before printing on Windows. The escapes should not be used unless
>>> printing to RGui console.
>>>
>>> I suppose you managed to leak the escapes but I cannot reproduce, the
>>> example you sent seems incomplete ("x" not used, not clear what
>>> encoding.R is, not clear where the encodeString is run) and none of the
>>> variations I ran leaked the escapes on R-devel. Please clarify the
>>> example if you believe it is a bug. Please also use current R-devel
>>> (I've relatively recently fixed a bug in decoding these escaped strings,
>>> perhaps unlikely, but not impossible it could be related).
>>>
>>> Best
>>> Tomas
>>>
>>> On 07/16/2018 10:01 PM, Kevin Ushey wrote:
>>>> Given the following R script:
>>>>
>>>>      x <- 1
>>>>      print(list())
>>>>      save(x, file = tempfile())
>>>>      output <- encodeString("apple")
>>>>      print(output)
>>>>
>>>> If I source this script from RGui on Windows, I see the output:
>>>>
>>>>      > source("encoding.R")
>>>>      list()
>>>>      [1] "\002ÿþapple\003ÿþ"
>>>>
>>>> That is, it's as though R has injected what looks like byte order
>>>> marks around the encoded string:
>>>>
>>>>      > charToRaw(output)
>>>>       [1] 02 ff fe 61 70 70 6c 65 03 ff fe
>>>>
>>>> FWIW I see the same output in R-patched and R-devel. Any idea what
>>>> might be going on? For what it's worth, I don't see the same issue
>>>> with R as run from the terminal.
>>>>
>>>> Thanks,
>>>> Kevin
>>>>
>>>> ______________________________________________
>>>> R-devel using r-project.org  mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


	[[alternative HTML version deleted]]



More information about the R-devel mailing list