[Rd] Using unicode from C interface of R

Wed Jan 22 01:08:47 CET 2014

On 14-01-21 5:41 PM, Sandip Nandi wrote:
> Hi ,
>
> I am using C interface of R . If a unicode string is read , in what format
> I could pass it back to R ?
> I was trying to use the following
>
>   tpStr = ( char *)val;
>   SET_STRING_ELT(innerList  , 0, mkChar(tpStr));
>
> It does not work .
>
> If I pass it back from as RAW format to R , what package is there to read
> it ? I mean package for interpreting RAW data .

There are a number of encodings for Unicode.  Most Unix systems use 
UTF-8, Windows uses UTF-16 for some things, etc.

If your string is known to be in UTF-8 that's easiest:  just use 
mkCharCE instead of mkChar, as described in Writing R Extensions.  If it 
is in UTF-16 you might have more trouble because of possible embedded 0 
bytes.  Translate to UTF-8 first using C facilities like 
WideCharToMultibyte.

Duncan Murdoch