[R] Scraping info from a web site?

Spencer Graves spencer.graves at effectivedefense.org
Wed Jan 31 11:36:04 CET 2018

Hi, All:

       What would you suggest one use to read the data on members of the 
US Congress and their positions on net neutrality from 
"https://www.battleforthenet.com/scoreboard" into R?

       I found recommendations for the "rvest" package to "Easily 
Harvest (Scrape) Web Pages".  I tried the following:

URL <- 'https://www.battleforthenet.com/scoreboard/'
Bftn <- read_html(URL)

List of 2
  $ node:<externalptr>
  $ doc :<externalptr>
  - attr(*, "class")= chr [1:2] "xml_document" "xml_node"

        However, I don't know what to do with <externalptr>.

       The "Selectorgadget" vignette with rvest suggested selecting what 
I wanted on the web page and pasting that as an argument into 
"html_node".  This led me to try the following:

Bftn_nodes <- html_nodes(Bftn,
     '.psb-unknown , #house, #senate, #senate p')

List of 4
  $ :List of 2
   ..$ node:<externalptr>
   ..$ doc :<externalptr>
   ..- attr(*, "class")= chr "xml_node"
  $ :List of 2
   ..$ node:<externalptr>
   ..$ doc :<externalptr>
   ..- attr(*, "class")= chr "xml_node"
  $ :List of 2
   ..$ node:<externalptr>
   ..$ doc :<externalptr>
   ..- attr(*, "class")= chr "xml_node"
  $ :List of 2
   ..$ node:<externalptr>
   ..$ doc :<externalptr>
   ..- attr(*, "class")= chr "xml_node"
  - attr(*, "class")= chr "xml_nodeset"

       This seems like it may be progress, but I'm still confused on 
what to do next.  Or maybe I should be using a different package? Or 
posting this question to someplace else like StackOverflow.com?

       Spencer Graves

More information about the R-help mailing list