[R] SOLVED: importing huge XML-Files -- new problem: special characters

Alexander Heidrich alexander.heidrich at uni-jena.de
Tue Sep 4 18:17:14 CEST 2007


Hi all,

thanks to the people who replied to my question! I finally solved the  
issue by writing own handlers and using xmlEventParse - which leads  
to the following problem which is so odd that its probably a bug.

I use several special charachter in my XML-File, e.g. umlauts or ° or  
µ - but no matter how I encode my XML (UTF or ISO) or I escape these  
characters xmlEventParse always stops parsing after the first umlaut  
and pretends to have more than one node even if there is really just  
one!

Example:

<locations>abc	aböcd	abdec</locations>

causes two events for locations and produces output in the form of:

	[,1]	[,2]	[,3]
[1,]	abc
[2,]	aböcd	abdec


Should it be like that? If I remove the umlauts, than everything is  
fine!

If I do the following:

<locations>öabc	aböcd	abdec</locations>

the output is

	[,1]	[,2]	[,3]
[1,]	öabc	aböcd	abdec

Any suggestions?

Thanks in advance and many greetings!

Alex



More information about the R-help mailing list