[R] postForm() in RCurl and library RHTMLForms

Duncan Temple Lang duncan at wald.ucdavis.edu
Fri Nov 5 13:32:48 CET 2010



On 11/4/10 11:31 PM, sayan dasgupta wrote:
> Thanks a lot thats exactly what I was looking for
> 
> Just a quick question I agree the form gets submitted to the URL
> "http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp"
> 
> and I am filling up the form in the page
> "http://www.nseindia.com/content/indices/ind_histvalues.htm"
> 
> How do I submit the arguments like FromDate, ToDate, Symbol using postForm()
> and submit the query to get the similar table.
> 

Well that is what the function that RHTMLForms creates does.
So you can look at that code and see that it calls formQuery()
which ends in a call to postForm(). You could use

   debug(postForm)

and examine the arguments to it.

postForm("...jsp", FromDate = "10-"


The answer is

o = postForm("http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp",
              FromDate = "01-11-2010", ToDate = "04-11-2010",
              IndexType = "S&P CNX NIFTY", check = "new",
             style = "POST" )


> 
> 
> 
> 
> 
> 
> On Fri, Nov 5, 2010 at 6:43 AM, Duncan Temple Lang
> <duncan at wald.ucdavis.edu>wrote:
> 
>>
>>
>> On 11/4/10 2:39 AM, sayan dasgupta wrote:
>>> Hi RUsers,
>>>
>>> Suppose I want to see the data on the website
>>> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>>>
>>> for the index "S&P CNX NIFTY" for
>>> dates "FromDate"="01-11-2010","ToDate"="02-11-2010"
>>>
>>> then read the html table from the page using readHTMLtable()
>>>
>>> I am using this code
>>> webpage <- postForm(url,.params=list(
>>>                        "FromDate"="01-11-2010",
>>>                        "ToDate"="02-11-2010",
>>>                        "IndexType"="S&P CNX NIFTY",
>>>                        "Indicesdata"="Get Details"),
>>>                  .opts=list(useragent = getOption("HTTPUserAgent")))
>>>
>>> But it doesn't give me desired result
>>
>> You need to be more specific about how it fails to give the desired result.
>>
>> You are in fact posting to the wrong URL. The form is submitted to a
>> different
>> URL -
>> http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp
>>
>>
>>
>>>
>>> Also I was trying to use the function getHTMLFormDescription from the
>>> package RHTMLForms but there we can't use the argument
>>> .opts=list(useragent = getOption("HTTPUserAgent")) which is needed for
>> this
>>> particular website
>>
>> That's not the case. The function RHTMLForms will generate for you does
>> support
>> the .opts parameter.
>>
>> What you want is something along the lines:
>>
>>
>>  # Set default options for RCurl
>>  # requests
>> options(RCurlOptions = list(useragent = "R"))
>> library(RCurl)
>>
>>  # Read the HTML page since we cannot use htmlParse() directly
>>  # as it does not specify the user agent or an
>>  # Accept:*.*
>>
>> url <- "http://www.nseindia.com/content/indices/ind_histvalues.htm"
>> wp = getURLContent(url)
>>
>>  # Now that we have the page, parse it and use the RHTMLForms
>>  # package to create an R function that will act as an interface
>>  # to the form.
>> library(RHTMLForms)
>> library(XML)
>> doc = htmlParse(wp, asText = TRUE)
>>  # need to set the URL for this document since we read it from
>>  # text, rather than from the URL directly
>>
>> docName(doc) = url
>>
>>  # Create the form description and generate the R
>>  # function "call" the
>>
>> form = getHTMLFormDescription(doc)[[1]]
>> fun = createFunction(form)
>>
>>
>>  # now we can invoke the form from R. We only need 2
>>  # inputs  - FromDate and ToDate
>>
>> o = fun(FromDate = "01-11-2010", ToDate = "04-11-2010")
>>
>>  # Having looked at the tables, I think we want the the 3rd
>>  # one.
>> table = readHTMLTable(htmlParse(o, asText = TRUE),
>>                        which = 3,
>>                        header = TRUE,
>>                        stringsAsFactors = FALSE)
>> table
>>
>>
>>
>>
>> Yes it is marginally involved. But that is because we cannot simply read
>> the HTML document directly from htmlParse() because the lack of Accept(&
>> useragent)
>> HTTP header.
>>
>>>
>>>
>>> Thanks and Regards
>>> Sayan Dasgupta
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list