[R] newbie xml parsing question

eric ericstrom at aol.com
Sat May 28 23:02:30 CEST 2011


I am trying to read some data off the zillow site. Newbie to xml, html,
parsing and the xml package. I've been able to load the web page I'm
interested with the following code but I'm not sure of the next step to get
the information I'm interested in into R :

library(XML)
url <- "http://www.zillow.com/homes/511 W Lafayette St, Norristown, PA_rb"
doc <-doc <- htmlTreeParse(url1, isURL=TRUE)
doc

I'd like to be able to pull the following information into R 

href home details string : 

/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site-map-bubble-address}

value for Zestimate \ Price: $239,000
Beds : 3
Baths: 1.0
Sqft :1630

I noticed all that information is in "doc". The section of doc where the
information is contained is shown below. How do I go about extracting this
information and getting it into R for the general case where the address in
url will change ?

LatLong.createFromDegrees(40.187567, -75.125861),
"<div class=\"map-bubble property-bubble\">	<div class=\"search-result\">
<div class=\"plisting\"> <div id=\"bubble-photoex-up\" class=\"photoex
hide\"> <div class=\"photoex-photos\"> </div>	<div class=\"mapsViews hide\">
</div> </div> <div id=\"property-zpid\" class=\"hide\">9933810</div> <div
id=\"property-home-info\"> <div id=\"pinfo-block\" class=\"property-info\">
<div class=\"adr\"> 
\"/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site-map-bubble-address}\" 
236 Arundel Ave, Horsham, PA   </div> <ul class=\"value-info\"> <li
class=\"type-allHomes\">   Zestimate<sup>®</sup>: $239,000  \"#\"  
<div id=\"zest-tip-bubble_toggleArea\" class=\"tooltip hide\">	 Close  <dl>
<dt>Zestimate</dt> <dd> A <strong>Zestimate®</strong> home valuation is
Zillow's estimated market value. It is not an appraisal. Use it as a
starting point to determine a home's value. <a
href=\"/wikipages/What-is-a-Zestimate/\"
href=\"#\">Learn more  </dd> </dl> </div> </li> <li
class=\"secondary monthly-payment\"> Mortgage payment: $963/mo <ul
class=\"carrot view-rates-aftertext\"> <li> 
\"/mortgage-rates/#{scid=mor-site-mapbubrates}\" See rates 	</li></ul> </li>
</ul> <ul class=\"attributes\"> <li class=\"prop-cola\">Beds: 3<br /> Baths:
1.0</li> <li class=\"prop-colb\">Sqft: 1,630<br /> Lot: 21,745</li> </ul>
</div> <ul class=\"has-photo actions clearfix\"> <li class=\"hinfo ztsa\"> 
\"/homedetails/236-Arundel-Ave-Horsham-PA-19044/9933810_zpid/#{scid=hdp-site-map-bubble-details}\"
Details  </li> <li class=\"mapHome ztsa\" zpid=\"9933810\">  \"#\" Views   
</li> <li class=\"faves ztsa\"> <a onclick=\"trackLink(this, 'Save',
{ 'events': 'event18', 'eVar4': 'Map Bubble' }); return
favoriteManager.addFavorite(9933810, favoriteManager.doneSaving(this),
event, true);\" class=\"not-saved\"
rel=\"nofollow\">Save  </li> </ul> </div>  Close  <div
id=\"bubble-photoex-down\" class=\"photoex hide\"> <div
class=\"photoex-photos\"> </div>	<div class=\"mapsViews hide\"> </div>
</div> </div>	</div>	<div class=\"bubble-beak\"> </div></div>"
)


--
View this message in context: http://r.789695.n4.nabble.com/newbie-xml-parsing-question-tp3558067p3558067.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list