[R] Scraping info from a web site?

Spencer Graves spencer.graves at effectivedefense.org
Wed Jan 31 11:36:04 CET 2018


Hi, All:


       What would you suggest one use to read the data on members of the 
US Congress and their positions on net neutrality from 
"https://www.battleforthenet.com/scoreboard" into R?


       I found recommendations for the "rvest" package to "Easily 
Harvest (Scrape) Web Pages".  I tried the following:


URL <- 'https://www.battleforthenet.com/scoreboard/'
library(rvest)
Bftn <- read_html(URL)
str(Bftn)


List of 2
  $ node:<externalptr>
  $ doc :<externalptr>
  - attr(*, "class")= chr [1:2] "xml_document" "xml_node"


        However, I don't know what to do with <externalptr>.


       The "Selectorgadget" vignette with rvest suggested selecting what 
I wanted on the web page and pasting that as an argument into 
"html_node".  This led me to try the following:


Bftn_nodes <- html_nodes(Bftn,
     '.psb-unknown , #house, #senate, #senate p')


str(Bftn_nodes)
List of 4
  $ :List of 2
   ..$ node:<externalptr>
   ..$ doc :<externalptr>
   ..- attr(*, "class")= chr "xml_node"
  $ :List of 2
   ..$ node:<externalptr>
   ..$ doc :<externalptr>
   ..- attr(*, "class")= chr "xml_node"
  $ :List of 2
   ..$ node:<externalptr>
   ..$ doc :<externalptr>
   ..- attr(*, "class")= chr "xml_node"
  $ :List of 2
   ..$ node:<externalptr>
   ..$ doc :<externalptr>
   ..- attr(*, "class")= chr "xml_node"
  - attr(*, "class")= chr "xml_nodeset"


       This seems like it may be progress, but I'm still confused on 
what to do next.  Or maybe I should be using a different package? Or 
posting this question to someplace else like StackOverflow.com?


       Thanks,
       Spencer Graves



More information about the R-help mailing list