[R] postForm() in RCurl and library RHTMLForms

Duncan Temple Lang duncan at wald.ucdavis.edu
Fri Nov 5 02:13:17 CET 2010



On 11/4/10 2:39 AM, sayan dasgupta wrote:
> Hi RUsers,
> 
> Suppose I want to see the data on the website
> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
> 
> for the index "S&P CNX NIFTY" for
> dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
> 
> then read the html table from the page using readHTMLtable()
> 
> I am using this code
> webpage <- postForm(url,.params=list(
>                        "FromDate"="01-11-2010",
>                        "ToDate"="02-11-2010",
>                        "IndexType"="S&P CNX NIFTY",
>                        "Indicesdata"="Get Details"),
>                  .opts=list(useragent = getOption("HTTPUserAgent")))
> 
> But it doesn't give me desired result

You need to be more specific about how it fails to give the desired result.

You are in fact posting to the wrong URL. The form is submitted to a different
URL - http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp



> 
> Also I was trying to use the function getHTMLFormDescription from the
> package RHTMLForms but there we can't use the argument
> .opts=list(useragent = getOption("HTTPUserAgent")) which is needed for this
> particular website

That's not the case. The function RHTMLForms will generate for you does support
the .opts parameter.

What you want is something along the lines:


 # Set default options for RCurl
 # requests
options(RCurlOptions = list(useragent = "R"))
library(RCurl)

 # Read the HTML page since we cannot use htmlParse() directly
 # as it does not specify the user agent or an
 # Accept:*.*

url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
wp = getURLContent(url)

 # Now that we have the page, parse it and use the RHTMLForms
 # package to create an R function that will act as an interface
 # to the form.
library(RHTMLForms)
library(XML)
doc = htmlParse(wp, asText = TRUE)
  # need to set the URL for this document since we read it from
  # text, rather than from the URL directly

docName(doc) = url

  # Create the form description and generate the R
  # function "call" the

form = getHTMLFormDescription(doc)[[1]]
fun = createFunction(form)


  # now we can invoke the form from R. We only need 2
  # inputs  - FromDate and ToDate

o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010")

  # Having looked at the tables, I think we want the the 3rd
  # one.
table = readHTMLTable(htmlParse(o, asText = TRUE),
                        which = 3,
                        header = TRUE,
                        stringsAsFactors = FALSE)
table




Yes it is marginally involved. But that is because we cannot simply read
the HTML document directly from htmlParse() because the lack of Accept(& useragent)
HTTP header.

> 
> 
> Thanks and Regards
> Sayan Dasgupta
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list