[R] question about XML (package)

Duncan Temple Lang duncan at research.bell-labs.com
Tue Mar 4 14:58:03 CET 2003


Apologies for the late reply; I was travelling and didn't see the message 
until Ott brought it to my attention today.

Indeed, Stephen's diagnosis and workaround is correct: excessive
trimming.  I have just put a new version of the package (XML_0.93-2)
on the Omegahat web site

  http://www.omegahat.org/RSXML


So with inputs

<?xml version="1.0"?>
<fields> 
<v1>a1  </v1>
<v1>1 </v1>
<v1>a b</v1>
<v1>a b c</v1>
<v1> a b c  </v1>
<v2> 2 </v2> 
<v3> 3</v3>
<v3> 3 </v3>
</fields>

we get

> v = xmlRoot(xmlTreeParse("oot.xml"))
> xmlSApply(v, xmlValue)
     v1      v1      v1      v1      v1      v2      v3      v3 
   "a1"     "1"   "a b" "a b c" "a b c"     "2"     "3"     "3" 


Thanks for bringing it to my attention.

 D.


Stephen C. Upton wrote:
> Ott,
> 
> I get the same thing on windows version. If you set "trim=FALSE" in the
> xmlTreeParse function call, it works. I suspect xmlTreeParse is trimming
> a little too much! But xmlTreeParse(with trim=TRUE) also works when the
> first character is a non-digit - see below. We'll probably need to look
> at the source code, unless someone else has better insight.
> 
> > a <- xmlTreeParse("test.xml",trim=FALSE)
> > a$doc
> $file
> [1] "test.xml"
> 
> $version
> [1] "1.0"
> 
> $children
> $children$fields
>  <fields>
> 
> 
>   <v1>
>   1
>   </v1>
> 
> 
>   <v2>
>    2
>   </v2>
> 
> 
>   <v3>
>    3
>   </v3>
> 
> 
>  </fields>
> 
> However, it also works when the first character is a non-digit - so far.
> Here's a revised test.xml file:
> <?xml version="1.0"?>
> <fields>
> <v1>a1 </v1>
> <v2>2 </v2>
> <v3> 3</v3>
> </fields>
> 
> > a <- xmlTreeParse("test.xml")
> > a
> $doc
> $file
> [1] "test.xml"
> 
> $version
> [1] "1.0"
> 
> $children
> $children$fields
>  <fields>
>   <v1>
>   a1
>   </v1>
>   <v2>
>   </v2>
>   <v3>
>   3
>   </v3>
>  </fields>
> 
> HTH
> steve
> 
> 
> -------------------------------
> > version
>          _
> platform i386-pc-mingw32
> arch     i386
> os       mingw32
> system   i386, mingw32
> status
> major    1
> minor    6.2
> year     2003
> month    01
> day      10
> language R  -
> 
> Ott Toomet wrote:
> 
> > Hi,
> >
> > I have a problem with spacing in XML files when reading them with
> > xmlTreeParse.  I don't know the exact specification of xml but
> > according what I have red before it should work.
> >
> > consider a tiny test.xml file:
> >
> > <?xml version="1.0"?>
> > <fields>
> > <v1>1 </v1>
> > <v2> 2 </v2>
> > <v3> 3</v3>
> > </fields>
> >
> > i.e. I have three fields v1, v2 and v3 which differ only by spacing.
> > Now when reading it as
> >
> > > a <- xmlTreeParse("/home/otoomet/tyyq/Taani-piir/andmed/test.xml")
> > > a$doc$children$fields
> >  <fields>
> >   <v1>
> >   </v1>
> >   <v2>
> >   2
> >   </v2>
> >   <v3>
> >   3
> >   </v3>
> >  </fields>
> >
> > you can see that field v1 is empty.  Is it my misinterpretation, or a
> > problem with the library?
> >
> > Thanks in advance,
> >
> > Ott
> >
> > -----------------
> > > version
> >          _
> > platform i686-pc-linux-gnu
> > arch     i686
> > os       linux-gnu
> > system   i686, linux-gnu
> > status
> > major    1
> > minor    5.1
> > year     2002
> > month    06
> > day      17
> > language R
> > ------------
> > Package: XML
> > Version: 0.93-1
> > Date: 2002/11/06
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help

-- 
_______________________________________________________________

Duncan Temple Lang                duncan at research.bell-labs.com
Bell Labs, Lucent Technologies    office: (908)582-3217
700 Mountain Avenue, Room 2C-259  fax:    (908)582-3340
Murray Hill, NJ  07974-2070       
         http://cm.bell-labs.com/stat/duncan




More information about the R-help mailing list