[R] Input encoding problem when using sweave with xetex

Erich Studerus erich.studerus at bli.uzh.ch
Wed May 12 17:36:12 CEST 2010


Putting \usepackage[cp1252]{inputenc} into my preamble is not an option,
because XeTeX unlike LaTeX needs UTF-8 has input encoding. My goal is also
to have a LyX document that can be compiled both on Mac and Windows.

I usually compile my Lyx-Sweave documents by one click of a button from
within Lyx. R code chunks are therefore executed by calling R from the
command line. If anybody knows how to run R with options(encoding="UTF-8")
from the command line under windows, that would be helpful.

The command that calls R during compilation is contained in this file:
http://cran.r-project.org/contrib/extra/lyx/preferences

Regards,
Erich


-----Ursprüngliche Nachricht-----
Von: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] 
Gesendet: Mittwoch, 12. Mai 2010 16:56
An: Erich Studerus
Cc: r-help at r-project.org
Betreff: Re: [R] Input encoding problem when using sweave with xetex

On 12/05/2010 9:48 AM, Erich Studerus wrote:
> Thanks. Since the encoding of x is unknown (Encoding(x) gives "unknown"),
I
> tried
>
> iconv(x, "", "UTF-8") 
>
> Unfortunately, accented letters are still not printed in the final PDF
> output.
>   

I think I gave you incomplete advice.

The line above will convert the native encoding to UTF-8.  That's 
probably fine, but it's not actually helpful.

The problem is that when R outputs a vector, it will convert it back to 
the native encoding, unless you take action to stop that.  If you don't 
mind changing your document for Windows, you can put

\usepackage[cp1252]{inputenc}

into the preamble, and use the Windows native CP1252 encoding 
throughout.  If you want something that will work in UTF-8 on Windows, 
you need to say

options(encoding="UTF-8")

*before* running Sweave.  (If you're running Sweave from the command 
line using "R CMD Sweave" then I don't know if you can specify the 
output encoding; it won't help to do it in the document code chunks).  
You also need to put the line

\usepackage[utf8]{inputenc}

into the document preamble, but it sounds as though Lyx has already done 
that for you.

Duncan Murdoch


> Regards,
> Erich
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] 
> Gesendet: Mittwoch, 12. Mai 2010 15:27
> An: Erich Studerus
> Cc: r-help at r-project.org
> Betreff: Re: [R] Input encoding problem when using sweave with xetex
>
> On 12/05/2010 8:37 AM, Erich Studerus wrote:
>   
>> Hello
>>
>>  
>>
>> Because I want to use different true type fonts with latex, I'm using the
>> XeTeX typesetting engine for my sweave-documents. I'm using Lyx with
>>     
> Sweave
>   
>> on a Windows 7 PC  and have set up LyX to work with XeTeX according to
the
>> following instructions:
>>
>> http://wiki.lyx.org/LyX/XeTeX
>>
>>  
>>
>> Because the input file for XeTeX is assumed to be in UTF-8 encoding, I
set
>> the encoding under LyX - Tools - Language Settings - Language to "Unicode
>> (XeTeX) (utf8)". Accented letters that I write into the LyX-document are
>> correctly typeset in the final PDF-document. However, character strings
>>     
> with
>   
>> accented letters that are read from Excel-files or other sources from
>>     
> within
>   
>> R during the LyX-Sweave document compilation are not. For instance, the
>> German umlauts of the following example are not correctly typeset, when
>> "Unicode (XeTeX) (utf8)" is used as input encoding.
>>
>>  
>>
>> <<echo=F>>=
>>
>> require(gdata)
>>
>> x <- read.xls("http://www.schwerhoerigkeit.pop.ch/hoergeraete_test.xls",
>> stringsAsFactors = F)[2,2]
>>
>> x
>>
>> @
>>
>>  
>>
>> I do not have this problem with a Mac computer . I guess, this is because
>>     
> R
>   
>> under Windows does not use UTF-8 encoding.   I tried to change the
>>     
> encoding
>   
>> within R by doing the following
>>
>>  
>>
>> <<echo=F>>=
>>
>> Encoding(x) <- 'UTF-8'
>>
>> x
>>
>> @
>>
>>  
>>
>> Unfortunately, this does not work. Does anybody have solution for this
>> problem?
>>   
>>     
>
> You need to use iconv() to change an encoding.  What you did just 
> changes the declared encoding, but doesn't actually change any bits.  So 
> you'd probably get what you want with
>
> x <- iconv(x, "", "UTF-8")
> x
>
> (though you may need to declare the input encoding; it is likely CP1252 
> on Windows).
>   
>> Duncan Murdoch
>>  
>>
>> Regards,
>>
>> Erich
>>
>>  
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>     
> http://www.R-project.org/posting-guide.html
>   
>> and provide commented, minimal, self-contained, reproducible code.
>>   
>>     
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   



More information about the R-help mailing list