[R] newbie xml parsing question

Upton, Stephen (Steve) (CIV) scupton at nps.edu
Tue May 31 16:47:47 CEST 2011


?getNodeSet may help

steve

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of eric
Sent: Saturday, May 28, 2011 5:03 PM
To: r-help at r-project.org
Subject: [R] newbie xml parsing question

I am trying to read some data off the zillow site. Newbie to xml, html,
parsing and the xml package. I've been able to load the web page I'm
interested with the following code but I'm not sure of the next step to get
the information I'm interested in into R :

library(XML)
url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown, PA_rb"
doc <-doc <- htmlTreeParse(url1, isURL=TRUE) doc

I'd like to be able to pull the following information into R 

href home details string : 

/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site-m
ap-bubble-address}

value for Zestimate \ Price: $239,000
Beds : 3
Baths: 1.0
Sqft :1630

I noticed all that information is in "doc". The section of doc where the
information is contained is shown below. How do I go about extracting this
information and getting it into R for the general case where the address in
url will change ?

LatLong.createFromDegrees(40.187567, -75.125861),
"<div class=\"map-bubble property-bubble\">	<div
class=\"search-result\">
<div class=\"plisting\"> <div id=\"bubble-photoex-up\" class=\"photoex
hide\"> <div class=\"photoex-photos\"> </div>	<div class=\"mapsViews
hide\">
</div> </div> <div id=\"property-zpid\" class=\"hide\">9933810</div> <div
id=\"property-home-info\"> <div id=\"pinfo-block\" class=\"property-info\">
<div class=\"adr\">
\"/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site
-map-bubble-address}\" 
236 Arundel Ave, Horsham, PA   </div> <ul class=\"value-info\"> <li
class=\"type-allHomes\">   Zestimate<sup>®</sup>: $239,000  \"#\"  
<div id=\"zest-tip-bubble_toggleArea\" class=\"tooltip hide\">	 Close  <dl>
<dt>Zestimate</dt> <dd> A <strong>Zestimate®</strong> home valuation is
Zillow's estimated market value. It is not an appraisal. Use it as a
starting point to determine a home's value. <a
href=\"/wikipages/What-is-a-Zestimate/\"
href=\"#\">Learn more  </dd> </dl> </div> </li> <li
class=\"secondary monthly-payment\"> Mortgage payment: $963/mo <ul
class=\"carrot view-rates-aftertext\"> <li> 
\"/mortgage-rates/#{scid=mor-site-mapbubrates}\" See rates 	</li></ul>
</li>
</ul> <ul class=\"attributes\"> <li class=\"prop-cola\">Beds: 3<br /> Baths:
1.0</li> <li class=\"prop-colb\">Sqft: 1,630<br /> Lot: 21,745</li> </ul>
</div> <ul class=\"has-photo actions clearfix\"> <li class=\"hinfo ztsa\">
\"/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site
-map-bubble-details}\"
Details  </li> <li class=\"mapHome ztsa\" zpid=\"9933810\">  \"#\" Views   
</li> <li class=\"faves ztsa\"> <a onclick=\"trackLink(this, 'Save',
{ 'events': 'event18', 'eVar4': 'Map Bubble' }); return
favoriteManager.addFavorite(9933810, favoriteManager.doneSaving(this),
event, true);\" class=\"not-saved\"
rel=\"nofollow\">Save  </li> </ul> </div>  Close  <div
id=\"bubble-photoex-down\" class=\"photoex hide\"> <div
class=\"photoex-photos\"> </div>	<div class=\"mapsViews hide\">
</div>
</div> </div>	</div>	<div class=\"bubble-beak\"> </div></div>"
)


--
View this message in context:
http://r.789695.n4.nabble.com/newbie-xml-parsing-question-tp3558067p3558067.
html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list