[R] cannot base64decode string which is base64encode in R

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Aug 6 09:47:17 CEST 2013


On 06/08/2013 08:34, Qiang Wang wrote:
> Thanks for your Elaborative explanation. If I'm understanding correct. "ߟ"
> belongs to those characters that CAN be interpreted by UTF-8. Others are
> left as they are, such as, "\xe4" and "\xac". So the following code will
> show an error message, but it won't affect the use of x?
> x <- "\xe4"
>
> I have a question maybe off the topic, but it bothered me much and can't
> find the answer anywhere:
> In R, how to add a null character to a string? Even just to store one null
> character seems not possible:
> x <- "\0". The question raised from a web api which requires submitted
> strings to contain a null character.

It is not possible.  Character strings in R cannot contain nuls (not 
nulls, sic).  Use raw vectors instead.

This is documented, so time to read some manuals ....

>
>
> On Tue, Aug 6, 2013 at 1:43 AM, Enrico Schumann <es at enricoschumann.net>wrote:
>
>> On Mon, 05 Aug 2013, Qiang Wang <unsown at gmail.com> writes:
>>
>>>> On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann <es at enricoschumann.net
>>> wrote:
>>>>
>>>>> On Fri, 02 Aug 2013, Qiang Wang <unsown at gmail.com> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm struggling with encode/decode strings in R. Don't know why the
>> second
>>>>>> example below would fail. Thanks in advance for your help.
>>>>>> succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x,
>> "character")
>>>>>> fail: s <- "safs" x <- base64encode(s) y <- base64decode(x,
>> "character")
>>>>>>
>>>>>
>>>>> And the first example works for you?
>>>>>
>>>>>    require("base64enc")
>>>>>    s <- "saf"
>>>>>    x <- base64encode(s)
>>>>>
>>>>> ## Error in file(what, "rb") : cannot open the connection
>>>>> ## In addition: Warning message:
>>>>> ## In file(what, "rb") : cannot open file 'saf': No such file or
>> directory
>>>>>
>>>>> ?base64encode says that its first argument is
>>>>>
>>>>>      "data to be encoded/decoded. For ‘base64encode’ it can be a raw
>>>>>       vector, text connection or file name. For ‘base64decode’ it can be
>>>>>       a string or a binary connection."
>>>>>
>>>>> Try this:
>>>>>
>>>>>    rawToChar(base64decode(base64encode(charToRaw("saf"))))
>>>>>
>>>>> ## [1] "saf"
>>>>>
>>>>> --
>>>>> Enrico Schumann
>>>>> Lucerne, Switzerland
>>>> http://enricoschumann.net
>>>>
>>>
>>> Thanks for your reply!
>>>
>>> Sorry I did not clarify that I was using base64encode and base64decode
>>> functions provide from "caTools" package. It seems that if I convert the
>>> string to the raw type first, it still solves my problem.
>>>
>>> My original problem actually is that I have a string:
>>> secret <-
>>>
>> '5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='
>>>
>>> It was claimed to be encoded in Base64. So I tried to decode it:
>>>
>>> require("base64enc")
>>> rawToChar(base64decode(secret))
>>>
>>> Then, I got
>>>
>> "\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ߟl\xc9\xf8V\xcdqk6"
>>>
>>> But what I suppose to get is:
>>>
>> '\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'
>>>
>>> Most part of the result is correct except several characters near the
>> end.
>>> I don't know where the problem is.
>>>
>>
>> See the help page of 'rawToChar': the function transforms raw bytes into
>> characters.  But, depending on your locale, one character may be more
>> than one byte.  On my computer, with a UTF-8 locale (see my
>> '?sessionInfo' below),
>>
>>    rawToChar(base64decode(secret), TRUE)
>>
>> gives me
>>
>>    ##  [1] "\xe4" "\xac" "."    "\x83" "\xe0" "r"    "\xae"
>>    ##  [8] "\xaf" "\xa2" "\x95" "B"    "\xcc" "\xcf" "r"
>>    ## [15] "\001" "\017" "\x9b" "j"    "\xb8" "\xdb" "y"
>>    ## [22] "\t"   "\xc7" "X"    "\x8b" "u"    "\xcf" "s"
>>    ## [29] "\xc8" "\xe7" "+"    "\v"   "W"    "\xbc" "\x88"
>>    ## [36] "\a"   "\xc3" "\xfb" "\xdc" "H"    "e"    "5"
>>    ## [43] "T"    "\""   "("    "\xe1" "\xbf" "\xce" "}"
>>    ## [50] "\xc4" "C"    "\x9b" "\xbb" "\x87" "\xdf" "\x9f"
>>    ## [57] "l"    "\xc9" "\xf8" "V"    "\xcd" "q"    "k"
>>    ## [64] "6"
>>
>> That is, every *single* byte is converted into character.  For example:
>>
>>    rawToChar(base64decode(secret), TRUE)[55:56]
>>
>> gives
>>
>>    ## [1] "\xdf" "\x9f"
>>
>> which probably is what you expected.  But if I paste those two
>> characters together,
>>
>>    paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")
>>
>> they will be shown like so:
>>
>>    ## [1] "ߟ"
>>
>> because this is how this byte pattern will be interpreted in UTF-8.
>>
>>
>>
>>
>> Abbreviated 'sessionInfo':
>>
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_GB.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_GB.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>>
>>
>> --
>> Enrico Schumann
>> Lucerne, Switzerland
>> http://enricoschumann.net
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list