[R] cannot base64decode string which is base64encode in R

Mon Aug 5 19:43:43 CEST 2013

On Mon, 05 Aug 2013, Qiang Wang <unsown at gmail.com> writes:

>> On Sat, Aug 3, 2013 at 3:49 PM, Enrico Schumann <es at enricoschumann.net>wrote:
>>
>>> On Fri, 02 Aug 2013, Qiang Wang <unsown at gmail.com> writes:
>>>
>>> > Hi,
>>> >
>>> > I'm struggling with encode/decode strings in R. Don't know why the second
>>> > example below would fail. Thanks in advance for your help.
>>> > succeed: s <- "saf" x <- base64encode(s) y <- base64decode(x, "character")
>>> > fail: s <- "safs" x <- base64encode(s) y <- base64decode(x, "character")
>>> >
>>>
>>> And the first example works for you?
>>>
>>>   require("base64enc")
>>>   s <- "saf"
>>>   x <- base64encode(s)
>>>
>>> ## Error in file(what, "rb") : cannot open the connection
>>> ## In addition: Warning message:
>>> ## In file(what, "rb") : cannot open file 'saf': No such file or directory
>>>
>>> ?base64encode says that its first argument is
>>>
>>>     "data to be encoded/decoded. For ‘base64encode’ it can be a raw
>>>      vector, text connection or file name. For ‘base64decode’ it can be
>>>      a string or a binary connection."
>>>
>>> Try this:
>>>
>>>   rawToChar(base64decode(base64encode(charToRaw("saf"))))
>>>
>>> ## [1] "saf"
>>>
>>> --
>>> Enrico Schumann
>>> Lucerne, Switzerland
>> http://enricoschumann.net
>>
>
> Thanks for your reply!
>
> Sorry I did not clarify that I was using base64encode and base64decode
> functions provide from "caTools" package. It seems that if I convert the
> string to the raw type first, it still solves my problem.
>
> My original problem actually is that I have a string:
> secret <-
> '5Kwug+Byrq+ilULMz3IBD5tquNt5CcdYi3XPc8jnKwtXvIgHw/vcSGU1VCIo4b/OfcRDm7uH359syfhWzXFrNg=='
>
> It was claimed to be encoded in Base64. So I tried to decode it:
>
> require("base64enc")
> rawToChar(base64decode(secret))
>
> Then, I got
> "\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\001\017\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\vW\xbc\x88\a\xc3\xfb\xdcHe5T\"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87ߟl\xc9\xf8V\xcdqk6"
>
> But what I suppose to get is:
> '\xe4\xac.\x83\xe0r\xae\xaf\xa2\x95B\xcc\xcfr\x01\x0f\x9bj\xb8\xdby\t\xc7X\x8bu\xcfs\xc8\xe7+\x0bW\xbc\x88\x07\xc3\xfb\xdcHe5T"(\xe1\xbf\xce}\xc4C\x9b\xbb\x87\xdf\x9fl\xc9\xf8V\xcdqk6'
>
> Most part of the result is correct except several characters near the end.
> I don't know where the problem is.
>

See the help page of 'rawToChar': the function transforms raw bytes into
characters.  But, depending on your locale, one character may be more
than one byte.  On my computer, with a UTF-8 locale (see my
'?sessionInfo' below),

  rawToChar(base64decode(secret), TRUE)

gives me 

  ##  [1] "\xe4" "\xac" "."    "\x83" "\xe0" "r"    "\xae"
  ##  [8] "\xaf" "\xa2" "\x95" "B"    "\xcc" "\xcf" "r"   
  ## [15] "\001" "\017" "\x9b" "j"    "\xb8" "\xdb" "y"   
  ## [22] "\t"   "\xc7" "X"    "\x8b" "u"    "\xcf" "s"   
  ## [29] "\xc8" "\xe7" "+"    "\v"   "W"    "\xbc" "\x88"
  ## [36] "\a"   "\xc3" "\xfb" "\xdc" "H"    "e"    "5"   
  ## [43] "T"    "\""   "("    "\xe1" "\xbf" "\xce" "}"   
  ## [50] "\xc4" "C"    "\x9b" "\xbb" "\x87" "\xdf" "\x9f"
  ## [57] "l"    "\xc9" "\xf8" "V"    "\xcd" "q"    "k"   
  ## [64] "6"

That is, every *single* byte is converted into character.  For example:

  rawToChar(base64decode(secret), TRUE)[55:56]

gives 

  ## [1] "\xdf" "\x9f"

which probably is what you expected.  But if I paste those two
characters together,

  paste(rawToChar(base64decode(s), TRUE)[55:56], collapse = "")

they will be shown like so:

  ## [1] "ߟ"

because this is how this byte pattern will be interpreted in UTF-8.

Abbreviated 'sessionInfo':

R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

-- 
Enrico Schumann
Lucerne, Switzerland
http://enricoschumann.net