[R] Parsing Google Finance page data?

Fri Nov 21 03:45:14 CET 2014

FWIW, this is the kludge I came up with.  The idea is that I only know 
the name of the company and not the ticker/exchange.  So the following 
admittedly doesn't work in all cases (e.g. "Time Warner").  So if anyone 
alternatively knows how to return a list of tickers/exchanges of 
companies matching a name, that would be helpful.  (Though that question 
should probably go to the finance list).  In any case, thanks in advance 
for any thoughts put towards this.
Matt

library(RCurl)
library(xts)
library(XML)

#want to return results of this
# http://www.google.com/finance?q=ibm

coname <- "ibm"

baseurl <-paste("http://www.google.com/finance?q=",coname,sep="")

# Read and parse HTML file
doc.html = htmlTreeParse(baseurl, useInternalNodes=TRUE)

tables <- 
readHTMLTable(doc.html,which=2,as.data.frame=T,stringsAsFactors = FALSE)
mktcap <- tables[4,2]

doc.text = unlist(xpathApply(doc.html, '//script', xmlValue))

block <- doc.text[11]
exchangeticker<-unlist(strsplit(block,'\n'))[11]

doc.text = unlist(xpathApply(doc.html, '//div', xmlValue))
currency <- doc.text[60]

print(mktcap)
print(exchangeticker)
print(currency)

---
This email is free from viruses and malware because avast! Antivirus protection is active.

	[[alternative HTML version deleted]]