[R] Best way to get the prices from these strings?

Keith S Weintraub kw1958 at gmail.com
Wed Jan 29 15:29:39 CET 2014


Folks,

I got the following prices by scraping a web page just for my own edification:

thePrices<-
c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>", "id=\"p2\">$69.95</div>", 
"id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>", "id=\"p5\">$79.95</div>", 
"id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>", "id=\"p8\">$59.95</div>", 
"id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>", "id=\"p11\">$89.95</div>", 
"id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>", "id=\"p14\">$89.95</div>", 
"id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>", "id=\"p17\">$59.95</div>", 
"id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>", "id=\"p20\">$73.95</div>", 
"id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>", "id=\"p23\">$87.95</div>", 
"id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>", "id=\"p26\">$61.95</div>\""
)

Using lapply and strsplit (twice) unlist etc. I was able to get the result I wanted (the prices as numbers e.g. 59.95)  but I am sure that there is a much better way that someone might be able to point out for me.

Note that I tried various regexes which didn't work.

Is part of the difficulty that the strings in thePrices have multiple \"'s in them?

Thanks for your time,
Best,
KW

--



More information about the R-help mailing list