[R] Set encoding when load()-ing workspaces?

Gustaf Rydevik gustaf.rydevik at gmail.com
Sun May 2 22:21:09 CEST 2010


Many thanks Prof. and Duncan!

Iconv worked like a charm together with CP1252 as the windows
encoding, and now all the text shows up correctly

Because the data frame also contained factors with levels that had
swedish characters, i ended up writing a small function for converting
the encoding of everything inside a dataframe in one go. It is a bit
slow, but hopefully someone else will find it useful in the future:

iconv.data.frame<-function(df,...){
     df.names<-iconv(names(df),...)
     df.rownames<-iconv(rownames(df),...)
     names(df)<-df.names
     rownames(df)<-df.rownames
     df.list<-lapply(df,function(x){
             if(class(x)=="factor"){x<-factor(iconv(as.character(x),...))}else
             if(class(x)=="character"){x<-iconv(x,...)}else{x}
      })
     df.new<-do.call("data.frame",df.list)
     return(df.new)
}


Best regards,
Gustaf


On Sun, May 2, 2010 at 8:36 PM, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:
> On Sun, 2 May 2010, Duncan Murdoch wrote:
>
>> Gustaf Rydevik wrote:
>>>
>>> Hi all,
>>>
>>> I hope that there is someone that can help me out here.
>>> I am trying to load() a workspace on os x (R 2.11.0) that was saved in
>>> windows XP (R 2.9). In that workspace, there's a data.frame with names
>>> that contain swedish characters. These characters become garbled,
>>> which is a major problem.
>>> >From the R windows FAQ, I read:
>>>
>>> "Note though that character data in a workspace will be in a
>>> particular encoding that is not recorded in the workspace, so
>>> workspaces containing non-ASCII character data may not be
>>> interchangeable even on the same OS. Since R marks character data when
>>> it knows it to be in UTF-8 or Latin-1 (including its Windows superset,
>>> CP1252), strings in those encodings are likely to be transferred
>>> correctly: fortunately this covers most of the common cases (Mac OS X
>>> normally uses UTF-8, and Linux users are likely to use UTF-8 or
>>> perhaps Latin-1 (which used to be used for English)). "
>>>
>>> Apparently, my case is not the most common one, and I don't know why.
>>> I've been trying to dig into the load() function, but since it uses a
>>> lot of .Internal functions, I get stuck there.
>>> I've also tried doing options(encoding="latin1"), which doesn't seem
>>> to change anything.
>>>
>>
>> You can't change the encoding when you load, but you can convert the
>> encoding later (using iconv()) if you know what encoding it is.  A good
>> guess for a file created on Windows in my locale is "latin1", but it's not
>> certain, and I don't know what is commonly used on Windows in a Swedish
>> locale.
>
> CP1252 (which is actually what you will get too).
>
>>
>> If you have an example where you know the correct version of the string
>> and you can show us what you're getting, together with charToRaw() applied
>> to it, someone will probably be able to make a guess at the encoding.
>>
>> Duncan Murdoch
>>
>>
>>> And now I'm stuck. Any suggestions on where to look?
>>> I've run into this issue twice before. The first time I managed to get
>>> it solved, but can't remember how (perhaps a .Rprofile setting
>>> somewhere?).
>>> The second time, I mailed R-Sig-Mac, got some tips that unfortunately
>>> did not lead anywhere, and subsequently gave up. I hope third time's a
>>> charm!
>>>
>>> Many thanks in advance,
>>> Gustaf
>>>
>>>
>>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>



--
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik



-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik



More information about the R-help mailing list