[R] Grap Element from Web Page

Sparks, John James jspark4 at uic.edu
Thu Aug 15 06:48:36 CEST 2013


Thanks so much for looking into this for me.

Unfortunately, I get an error when I execute your code.  Is there a
library that you loaded that I haven't?

require(scrapeR)
require(XML)
require(RCurl)
doc<-htmlTreeParse("http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany")
node <- getNodeSet(doc[[1]], "//link[@rel='alternate']" )
Error in UseMethod("xpathApply") :
  no applicable method for 'xpathApply' applied to an object of class
"character"


Guidance would be much appreciated.

--JJS



On Wed, August 14, 2013 4:19 am, Jeffrey Dick wrote:
> Hi,
>
> There are many occurrences of the CIK number in the page source. This
> pulls
> out the first node containing it:
>
> node <- getNodeSet(doc[[1]], "//link[@rel='alternate']" )
>
> From there you can extract the number. Here's one way to do it.
>
> strsplit(strsplit(unlist(node)[[5]], "CIK=")[[1]][2], "&type")[[1]][1]
>
> Jeff
>
>
> On Wed, Aug 14, 2013 at 1:34 PM, Sparks, John James <jspark4 at uic.edu>
> wrote:
>
>> Dear R Helpers,
>>
>> I would like to pull the CIK number from the web page
>>
>>
>> http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany
>>
>> If you put this web page into your browser you will see the CIK number
>> in
>> red on the left side of the page near the top.
>>
>> When I try the basic
>> require(scrapeR)
>> require(XML)
>> require(RCurl)
>> doc
>> <-htmlTreeParse("
>> http://www.sec.gov/cgi-bin/browse-edgar?CIK=MSFT&Find=Search&owner=exclude&action=getcompany
>> ")
>> str(doc)
>>
>> I get a large number of items in the data frame that I don't know how to
>> interpret.  Both
>> tables <- readHTMLTable(doc)
>>
>> and
>>
>> list<-xmlToList(doc)
>>
>> result in errors.
>>
>> Any (positive) guidance would be much appreciated.
>>
>> --John J. Sparks, Ph.D.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list