[R] SAX Parser best practise

Jan Hummel Hummel at mpimp-golm.mpg.de
Mon Sep 26 10:13:53 CEST 2005


Hi Duncan,

thanks again for your comments.

> I dug around in the libxml code and the Web to verify that 
> validation is indeed only possible in libxml when one uses 
> DOM (i.e. xmlTreeParse()).
Using DOM is not an option for me, so I need to "validate" the xml parts
I'm interested in within my creation mechanism. It's OK, but not the
best solution in questions of design.

> BTW, there is a new version of the XML package on the 
> Omegahat web site.
I'll use it extensive in this days and unfortunately I have already a
question/problem pending:

Taking the following R function:

test<-function(){
	sep=""
	xmlText <-""
	xmlText <-paste(xmlText,"<spectrum id=\"3257\">",sep=sep)
	xmlText <-paste(xmlText,"<mzArrayBinary>",sep=sep)
	xmlText <-paste(xmlText,"<data>Monday</data>",sep=sep)
	xmlText <-paste(xmlText,"</mzArrayBinary>",sep=sep)
	xmlText <-paste(xmlText,"<intenArrayBinary>",sep=sep)
	xmlText <-paste(xmlText,"<data>Tuesday</data>",sep=sep)
	xmlText <-paste(xmlText,"</intenArrayBinary>",sep=sep)
#	xmlText <-paste(xmlText,"</spectrum>",sep=sep)
#	xmlText <-paste(xmlText,"<spectrum id=\"3259\">",sep=sep)
	xmlText <-paste(xmlText,"<mzArrayBinary>",sep=sep)
	xmlText <-paste(xmlText,"<data>Wednesday</data>",sep=sep)
	xmlText <-paste(xmlText,"</mzArrayBinary>",sep=sep)
	xmlText <-paste(xmlText,"<intenArrayBinary>",sep=sep)
	xmlText <-paste(xmlText,"<data>Thursday</data>",sep=sep)
	xmlText <-paste(xmlText,"</intenArrayBinary>",sep=sep)
	xmlText <-paste(xmlText,"</spectrum>",sep=sep)

	xmlEventParse(xmlText, asText=TRUE, handlers = list(text =
function(x, ...) {cat(nchar(x),x, "\n")}))
	return(invisible(NULL))
}

Using this function in the given form works fine. xmlEventParse() with
the simplest handler I can imagine finds all 4 text-nodes within the
<spectrum> tag and prints them out. But if one uncomment both lines in
the middle, introducing 2 <spectrum> tags with different id's
xmlEventParse() returns with an exception. Of course the weekdays within
<data> are arbitrary values used here. Further, using an other input
file I could see, that for one and the same <data> node the handler for
"text"-nodes was invoked two times, one time for a first part of the
content and one time for the rest of the content. Both invocations
together gave me exactly the content from the <data> node. 

So, am I on the wrong way? Or is this some buggy behaviour? 

I appreciat any help and assistance!

Jan




More information about the R-help mailing list