[R] Text Encoding

David Winsemius dwinsemius at comcast.net
Sat Apr 6 16:37:09 CEST 2013


On Apr 5, 2013, at 11:30 AM, Emily Ottensmeyer wrote:

> Dear R-Help,
> 
> I am using the RDF package/ R 2.14 with the RDF package to download data
> from a website, and then use R to manipulate it.
> 
> Text on the website is UTF-8.  The RDF package's rdf_load command is
> converting it into a different encoding, which converts non-ASCII
> characters to unicode codes.
> 
> On the webpage/sparql RDF: "4.5µg of cDNA was used"
> 
> In R, the RDF triple gives: "4.5\\u00B5g of cDNA was used"
> 
> I can't seem to convert it back from \\u00B5  into "µ".
> 
> I've tried iconv with various settings without success:
>> iconv(test, "latin1", "UTF-8")
> [1] "4.5\\u00B5g of cDNA was used"
> 
> And, I tried Encoding, to see if I could figure that out, but it returns
> "unknown" on my string.
>> Encoding(test)
> [1] "unknown"
> 
On my device entering this: "4.5\\u00B5g of cDNA was used"

... returns [1] "4.5\\u00B5g of cDNA was used"

But entering: "4.5\u00B5g of cDNA was used" returns:

[1] "4.5µg of cDNA was used"

> nchar("4.5\\u00B5g of cDNA was used")
[1] 27
> nchar("4.5\u00B5g of cDNA was used")
[1] 22

So the doubled "\" is really a single character in the first case  and has no effect in escaping the next four hex digits but "\u00B5" in the second case is a correct "micro-character" (for my setup with my fonts)

If this is a systematic problem then you should contact the maintainer with a full problem description and a link to the website. If this is just a one-off problem just remove the extraneous backslash.

-- 
David.

> sessionInfo()
R version 3.0.0 RC (2013-03-31 r62463)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
<snipped>

> Anyone have any ideas on how to correct/convert the text encoding?
> 
> 
> Thanks!
> -Emily
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list