[R] SAX Parser best practise

Duncan Temple Lang duncan at wald.ucdavis.edu
Mon Sep 26 16:13:28 CEST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


When you uncomment the two lines, your document
becomes two nodes
 <spectrum>
    ...
 <spectrum>
 <spectrum>
   ...
 </spectrum>

XML requires that there be a single top-level node.
And so the parser throws an error saying
  Extra content at the end of the document

And it is the second <spectrum> .. </spectrum>
node that it is complaining about.
You can wrap the entire thing in a top node, e.g.
<spectra> <spectrum>...</spectrum><spectrum>...</spectrum></spectra>

How did I find this?  I looked at the error message from
libxml. Now that we have exceptions in R and we are using
libxml2, etc. I can make this material available at the
R level. So I'll do that.


Jan Hummel wrote:
> Hi Duncan,
> 
> 
>>BTW, there is a new version of the XML package on the 
>>Omegahat web site.
> 
> I'll use it extensive in this days and unfortunately I have already a
> question/problem pending:
> 
> Taking the following R function:
> 
> test<-function(){
> 	sep=""
> 	xmlText <-""
> 	xmlText <-paste(xmlText,"<spectrum id=\"3257\">",sep=sep)
> 	xmlText <-paste(xmlText,"<mzArrayBinary>",sep=sep)
> 	xmlText <-paste(xmlText,"<data>Monday</data>",sep=sep)
> 	xmlText <-paste(xmlText,"</mzArrayBinary>",sep=sep)
> 	xmlText <-paste(xmlText,"<intenArrayBinary>",sep=sep)
> 	xmlText <-paste(xmlText,"<data>Tuesday</data>",sep=sep)
> 	xmlText <-paste(xmlText,"</intenArrayBinary>",sep=sep)
> #	xmlText <-paste(xmlText,"</spectrum>",sep=sep)
> #	xmlText <-paste(xmlText,"<spectrum id=\"3259\">",sep=sep)
> 	xmlText <-paste(xmlText,"<mzArrayBinary>",sep=sep)
> 	xmlText <-paste(xmlText,"<data>Wednesday</data>",sep=sep)
> 	xmlText <-paste(xmlText,"</mzArrayBinary>",sep=sep)
> 	xmlText <-paste(xmlText,"<intenArrayBinary>",sep=sep)
> 	xmlText <-paste(xmlText,"<data>Thursday</data>",sep=sep)
> 	xmlText <-paste(xmlText,"</intenArrayBinary>",sep=sep)
> 	xmlText <-paste(xmlText,"</spectrum>",sep=sep)
> 
> 	xmlEventParse(xmlText, asText=TRUE, handlers = list(text =
> function(x, ...) {cat(nchar(x),x, "\n")}))
> 	return(invisible(NULL))
> }
> 
> Using this function in the given form works fine. xmlEventParse() with
> the simplest handler I can imagine finds all 4 text-nodes within the
> <spectrum> tag and prints them out. But if one uncomment both lines in
> the middle, introducing 2 <spectrum> tags with different id's
> xmlEventParse() returns with an exception. Of course the weekdays within
> <data> are arbitrary values used here. Further, using an other input
> file I could see, that for one and the same <data> node the handler for
> "text"-nodes was invoked two times, one time for a first part of the
> content and one time for the rest of the content. Both invocations
> together gave me exactly the content from the <data> node. 
> 
> So, am I on the wrong way? Or is this some buggy behaviour? 
> 
> I appreciat any help and assistance!
> 
> Jan
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

- --
Duncan Temple Lang                duncan at wald.ucdavis.edu
Department of Statistics          work:  (530) 752-4782
371 Kerr Hall                     fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (Darwin)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD4DBQFDOAII9p/Jzwa2QP4RAg+9AKCCkYAwTjlMQ9R9dsLbeWQxuf63uQCYkR3g
nEZl4wFXtkYSmsQ8/JyMDA==
=wXfS
-----END PGP SIGNATURE-----




More information about the R-help mailing list