[R] Need help extracting info from XML file using XML package

Don MacQueen macq at llnl.gov
Mon Mar 2 04:42:18 CET 2009


I have an XML file that has within it the coordinates of some 
polygons that I would like to extract and use in R. The polygons are 
nested rather deeply. For example, I found by trial and error that I 
can extract the coordinates of one of them using functions from the 
XML package:

   doc <- xmlInternalTreeParse('doc.kml')
   docroot <- xmlRoot(doc)
   pgon <-    xmlValue(docroot[[52]][[3]][[7]][[3]][[3]][[1]][[1]])

but this is hardly general!

I'm hoping there is some relatively straightforward way to use 
functions from the XML package to recursively descend the structure 
and return the text strings representing the polygons into, say, a 
list with as many elements as there are polygons. I've been looking 
at several XML documentation files downloaded from 
http://www.omegahat.org/RSXML/ , but since my understanding of XML is 
weak at best, I'm having trouble.  I can deal with converting the 
text strings to an R object suitable for plotting etc.


Here's a look at the structure of this file

graphics[5]% grep Polygon doc.kml
         <Polygon id="15342">
         </Polygon>
         <Polygon id="1073">
         </Polygon>
         <Polygon id="16508">
         </Polygon>
         <Polygon id="18665">
         </Polygon>
         <Polygon id="32903">
         </Polygon>
         <Polygon id="5232">
         </Polygon>

And each of the <Polygon> </Polygon> pairs has <coordinates> as per 
this example:


	<Polygon id="15342">
		<outerBoundaryIs>
			<LinearRing id="11467">
				<coordinates>
-23.679835352296,30.263840290388,5.000000000000001
-23.68138782285701,30.264740875186,5.000000000000001
    [snip]
-23.679835352296,30.263840290388,5.000000000000001
-23.679835352296,30.263840290388,5.000000000000001 </coordinates>
			</LinearRing>
		</outerBoundaryIs>
	</Polygon>


Thanks!
-Don


p.s.
There is a lot of other stuff in this file, i.e, some points, and 
attributes of the points such as color, as well as a legend 
describing what the polygons mean, but I can get by without all that 
stuff, at least for now.

Note also that readOGR() would in principle work, but the underlying 
OGR libraries have some limitations that this file exceeds. Per info 
at http://www.gdal.org/ogr/drv_kml.html.
-- 
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
macq at llnl.gov




More information about the R-help mailing list