[R] help with web scraping

Spencer Graves @pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Thu Jul 23 23:49:11 CEST 2020


Hello, All:


       I've failed with multiple attempts to scrape the table of 
candidates from the website of the Missouri Secretary of State:


https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975


       I've tried base::url, base::readLines, xml2::read_html, and 
XML::readHTMLTable; see summary below.


       Suggestions?
       Thanks,
       Spencer Graves


sosURL <- 
"https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975"

str(baseURL <- base::url(sosURL))
# this might give me something, but I don't know what

sosRead <- base::readLines(sosURL) # 404 Not Found
sosRb <- base::readLines(baseURL) # 404 Not Found

sosXml2 <- xml2::read_html(sosURL) # HTTP error 404.

sosXML <- XML::readHTMLTable(sosURL)
# List of 0;  does not seem to be XML

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS: 
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: 
/Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets
[6] methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2    curl_4.3
[4] xml2_1.3.2     XML_3.99-0.3



More information about the R-help mailing list