[R] file reading /problems with encoding

Uwe Ligges ligges at statistik.tu-dortmund.de
Mon Mar 1 16:32:14 CET 2010



On 01.03.2010 15:45, T.Wunder at stud.uni-heidelberg.de wrote:
> Hello,
>
> I'm a little frightened because of a problem that occured lately as I
> tried to read in a xml-file (for replacing some variables in the string
> with values from a data frame). The almost biggest problem is the
> encoding of the xml-file. Since it is generated by Word 2007 its
> encoding is UTF-8 (as to see in the xml-header).
> Now I'm establishing a file connection with
>> channel <- file(filename,open="r+", encoding="UTF-8")
>> ## filename = name of the file
>
> For reading the whole file, I'm using the readLines()-function as follows
>> t <- readLines(channel, n=-1,warn=F, encoding="UTF-8")
>
> Eventually I'm merging the lines of this data frame with the following
>> xml <- ""
>> for(i in 1:length(t)) {
>> xml <- paste(xml,t[i],sep="")
>> }

You can arrange the former without a loop by

xml <- paste(t, collapse="")

For the other problem you are reporting: Can you make (the relevbant 
part of) your file available (say on some web site) so that we can test 
what is going on?

Best,
Uwe Ligges



>
> (is there a better way of doing this?)
>
> However, when I execute those lines, I get a warning like:
> "incorrect input in the input-connection"
> When I read the output variable xml, it's kind of clear: The string
> stops at a combination of chinese or japanese characters (which normally
> shouldn't be a problem for UTF-8 encoding).
>
> So that is the problem. How am I able to read in the whole xml-file as a
> string in R? I need to have the correct encoding, because I want to grep
> after special character like "ü".
>
> Thank you for your help!
>
> Kind regards, Tom
>
>
> p.s. I'm not likely to use the XML-package, since I didn't want to parse
> the xml file :)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list