[R] Character change to Unicode format escape character when create a data frame

David Winsemius dwinsemius at comcast.net
Sat Mar 23 19:47:00 CET 2013


On Mar 23, 2013, at 7:08 AM, Huidong Tian wrote:

> Hi, 
>  I want to create a data frame including a column containing some special characters, like "ø".  when I print that data frame out, the content change to <U+00F8>, and when save the data frame to a txt file, the content keep in that style, but I need it in its original form, anybody can explain?
> 
> 
>> x <- data.frame(part = c("mlløs", "ny"))
>> x
>                  part
> 1 m<U+00E5>ll<U+00F8>s
> 2                   ny
> 
> x[1,1]
> [1] målløs
> Levels: m<U+00E5>ll<U+00F8>s ny
> 

You have two problems. The trivial one is that by default data.frame stores character input as factors. The more fundamental difficulty is that you do not understand that display of characters is not the same as the internal representation. You already have achieved your desire and do not realize it. The number of characters in x[1,1] will be 6. Try it:

> x <- data.frame(part = c("målløs", "ny"), stringsAsFactors=FALSE)
> x
    part
1 målløs
2     ny
> nchar(x[1,1])
[1] 6

Also try:

cat(x)


It's just that <U+00E5> is one way of representing a character that is not in the font table for the device you are working on. It is a single character internally in UTF-8 encoding. My Mac does keep the 'å' and my sans font is Helvetica, but you may be on a machine with a different sans font.

> quartzFonts('sans')
$sans
[1] "Helvetica"             "Helvetica-Bold"        "Helvetica-Oblique"    
[4] "Helvetica-BoldOblique"


If you are on a different interactive device you should look at its help page to see the manner in which you change settings. For me that is 

?quartz   # but for you it might be ?windows


 If you want to see the printed translation to the input you can try cat() or you can print to a device that has a font with the proper glyph. The system setting can be adjusted with the various functions that specify the fonts in use with various devices:

?Devices
?options
?Encoding

> 
> 	[[alternative HTML version deleted]]

You should learn to post in plain-text. (Gmail does support that choice, but you need to make the effort.) This si a question that can be machine dependent and for any follow-up questions you need to include the output of sessionInfo as requested in the Posting Guide.

-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list