[R] Best way to get the prices from these strings?

arun smartpink111 at yahoo.com
Wed Jan 29 16:48:31 CET 2014


You could use ?gsub() or
library(qdap)
as.numeric(unlist(genXtract(thePrices,">$","<")))
# [1] 69.95 44.95 69.95 59.95 69.95 79.95 89.95 59.95 59.95 79.95 79.95 89.95
#[13] 89.95 79.95 89.95 79.95 39.95 59.95 69.95 83.95 73.95 83.95 93.95 87.95
#[25] 91.95 99.95 61.95
A.K.





On Wednesday, January 29, 2014 9:33 AM, Keith S Weintraub <kw1958 at gmail.com> wrote:
Folks,

I got the following prices by scraping a web page just for my own edification:

thePrices<-
c("id=\"p0\">$69.95</div>", "id=\"p1\">$44.95</div>", "id=\"p2\">$69.95</div>", 
"id=\"p3\">$59.95</div>", "id=\"p4\">$69.95</div>", "id=\"p5\">$79.95</div>", 
"id=\"p6\">$89.95</div>", "id=\"p7\">$59.95</div>", "id=\"p8\">$59.95</div>", 
"id=\"p9\">$79.95</div>", "id=\"p10\">$79.95</div>", "id=\"p11\">$89.95</div>", 
"id=\"p12\">$89.95</div>", "id=\"p13\">$79.95</div>", "id=\"p14\">$89.95</div>", 
"id=\"p15\">$79.95</div>", "id=\"p16\">$39.95</div>", "id=\"p17\">$59.95</div>", 
"id=\"p18\">$69.95</div>", "id=\"p19\">$83.95</div>", "id=\"p20\">$73.95</div>", 
"id=\"p21\">$83.95</div>", "id=\"p22\">$93.95</div>", "id=\"p23\">$87.95</div>", 
"id=\"p24\">$91.95</div>", "id=\"p25\">$99.95</div>", "id=\"p26\">$61.95</div>\""
)

Using lapply and strsplit (twice) unlist etc. I was able to get the result I wanted (the prices as numbers e.g. 59.95)  but I am sure that there is a much better way that someone might be able to point out for me.

Note that I tried various regexes which didn't work.

Is part of the difficulty that the strings in thePrices have multiple \"'s in them?

Thanks for your time,
Best,
KW

--

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list