[R] How to substitute special characters within a data frame?

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Aug 15 12:43:36 CEST 2008


You've not told us the 'at a minimum' information requested in the posting 
guide.  What OS?  What locale? And how did you 'import'?

But here's a guess.  If you change \\345 to \345, it should render 
correctly in a Latin-1 locale:

> "H\345rkan"
[1] "Hårkan"

If this a UTF-8 locale, convert it

> iconv("H\345rkan", "latin1")
[1] "Hårkan"

and if you have an unsuitable locale, e.g. a Chinese one

> iconv("H\345rkan", "latin1", "ASCII//TRANSLIT")
[1] "Harkan"

or

> gsub("\\\\345", "aa", "H\\345rkan")
[1] "Haarkan"


On Fri, 15 Aug 2008, Yingfu Xie wrote:

> Hello all,
>
> I have a data frame in R, imported from an excel file in Swedish. The 
> original file contains several columns that have special characters, 
> such as \?{a}, \?{o}, and so on. After import such special characters 
> are represented in the data frame by "\\345", "\\366" etc (don't ask me 
> why). For example, a word "H?rkan" becomes ''H\\345rkan".

That's odd: the quotes do not match.

We do need to ask you 'why', as we have nothing reproducible here.

> Now my question is if it is possible to substitute those "H\\345rkan" by 
> "Haarkan" or simply "Harkan" in R, ideally by finding those "\\345" and 
> then replacing.
>
> Thanks in advance,
> Yingfu
>
> 	[[alternative HTML version deleted]]

Please don't (as the posting guide asked).  Properly encoded plain text 
has a chance of working.


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list