[R] Scraping a web page.
    Gabor Grothendieck 
    ggrothendieck at gmail.com
       
    Tue May 15 13:55:33 CEST 2012
    
    
  
On Tue, May 15, 2012 at 7:06 AM, Keith Weintraub <kw1958 at gmail.com> wrote:
> Thanks,
>  That was very helpful.
>
> I am using readLines and grep. If grep isn't powerful enough I might end up using the XML package but I hope that won't be necessary.
>
This only uses readLines and strapplyc (from gsubfn).  It scrape the
relevant strings from your post on nabble and by modifying URL and pat
you can likely get it to work with whatever the format of your
original files is:
library(gsubfn)
URL <- "http://r.789695.n4.nabble.com/Scraping-a-web-page-tp4630005.html"
L <- readLines(URL)
pat <- '<br/>"/en/Ships.*-(\\d{7}).html"'
strapplyc(L, pat, simplify = c)
The result from the last line is:
[1] "8605507" "8122830"
-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
    
    
More information about the R-help
mailing list