[R] XML and str

Martin Maechler maechler at stat.math.ethz.ch
Sat Feb 10 16:54:23 CET 2007


>>>>> "DTL" == Duncan Temple Lang <duncan at wald.ucdavis.edu>
>>>>>     on Sat, 10 Feb 2007 07:18:30 -0800 writes:

    DTL> Martin Maechler wrote:
    >>>>>>> "Ashley" == Ashley Ford <ford at signal.QinetiQ.com>
    >>>>>>> on Wed, 07 Feb 2007 17:18:56 +0000 writes:
    >> 
    Ashley> If I read in an .xml file eg with 
    >> 
    >> >> xeg <- xmlTreeParse(system.file("exampleData", "test.xml",
    >> package="XML"))
    >> 
    Ashley> It appears to be OK however examining it with str() gives an apparent
    Ashley> error
    >> 
    >> >> str(xeg, 2)
    Ashley> List of 2
    Ashley> $ doc:List of 3
    Ashley> ..$ file    : list()
    Ashley> .. ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
    Ashley> ..$ version :List of 4
    Ashley> .. ..- attr(*, "class")= chr "XMLNode"
    Ashley> ..$ children:Error in obj$children[[...]] : subscript out of bounds
    >> 
    Ashley> I am unsure if this is a feature or a bug and if the latter whether it
    Ashley> is in XML or str, it is not causing a problem but I would like to
    Ashley> understand what is happening, any ideas ?
    >> 
    >> Yes -  thank you for providing a well-reproducible example.
    >> After setting  
    >> options(error = recover)
    >> 
    >> I do
    >> 
    >> > obj <- xeg$doc
    >> > mode(obj)     # "list"
    >> [1] "list"
    >> > is.list(obj)  # TRUE
    >> [1] TRUE
    >> > length(obj)   # 3
    >> [1] 3
    >> > obj[[3]]      # ---> the error you see above.
    >> Error in obj$children[[...]] : subscript out of bounds
    >> 
    >> Enter a frame number, or 0 to exit   
    >> 
    >> 1: obj[[3]]
    >> 2: `[[.XMLDocumentContent`(obj, 3)
    >> 
    >> Selection: 0
    >> 
    >> > obj$children  # works, should be identical to obj[[3]]
    >> $comment
    >> <!--A comment-->
    >> 
    >> $foo
    >> <foo x="1">
    >> <element attrib1="my value"/>
    >> ......
    >> 
    >> This shows that the XML package implements the "[[" method
    >> wrongly IMHO and also inconsistently with the "$" method.
    >> 
    >>> From a strict OOP view, the XML author could argue that
    >> this is not a bug in XML but rather str() which assumes that
    >> x[[length(x)]] works for objects of mode "list" even when they
    >> are not of *class* "list", but I hope he would still rather
    >> consider changing [[.XMLDocumentContent ...
    >> 


    DTL> More likely, the appropriate fix is to have
    DTL> length() return the relevant value.

Hmm. 

  > library(XML)
  > xeg <- xmlTreeParse(system.file("exampleData", "test.xml", package= "XML"))
  > obj <- xeg$doc
  > mode(obj)     # "list"
  [1] "list"
  > is.list(obj)  # TRUE
  [1] TRUE
  > length(obj)   # 3
  [1] 3
  > obj[[3]]      # ---> the error you see above.
  Error in obj$children[[...]] : subscript out of bounds
  > names(obj)
  [1] "file"     "version"  "children"
  > class(obj)
  [1] "XMLDocumentContent"
  > methods(class=class(obj))
  [1] xmlApply.XMLDocumentContent*  [[.XMLDocumentContent*       
  [3] xmlRoot.XMLDocumentContent*   xmlSApply.XMLDocumentContent*

  > XML:::`[[.XMLDocumentContent`
  function (obj, ...) 
  {
      obj$children[[...]]
  }
  <environment: namespace:XML>

so  length(obj) is 3 and obj is a simple S3 object
which is just a list with 3 named components,
Do you really want to define  length(.) to also return the
length of obj$children instead of the length() of the list
itself?   
With that you'd have your XMLDocumentContent objects ``look''
like lists with three named components on one hand
(and help(xmlTreeParse) does mention these components)
but behave in other contexts as if it was just its own component
'obj$children'.   Of course you then should also define 
  print.XMLDocumentContent() and
  str.XMLDocumentContent()   accordingly, 
so users would barely know about the "file" and "version"
component of 'obj'.
But is this really desirable ?
With the above "[[.XMLDoc..."  you break the basic S-language
premise of  "[[" and "$" to behave accordingly.

You could solve "everything" elegantly if you used S4 instead of S3
classes, since there's no defined correspondence between slot
access and "[[" (and yes, then (with S4), I'd agree that 

setMethod("length", "XMLDocumentContent", 
          function(x) length(x at children))

would be needed too -- and fine.

Martin

    DTL> I even recall considering this at the time of writing
    DTL> the package initially.  But that was back in 1999/2000
    DTL> and S4 and R/S-Plus compatibility were not what they
    DTL> are now.  It could be changed.  Not certain when I will
    DTL> get a chance.


    Ashley> examining components eg 
    >> >> str(xeg$doc$children,2)
    >> 
    Ashley> List of 2
    Ashley> $ comment: list()
    Ashley> ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
    Ashley> etc 
    >> 
    Ashley> is OK.
    >> 
    Ashley> XML Version 1.4-1, 
    Ashley> same behaviour on Windows and Linux, R version 2.4.1 (2006-12-18)
    >>



More information about the R-help mailing list