[R] Finding the right url for RCurl

Brian Diggs diggsb at ohsu.edu
Thu Aug 5 19:32:06 CEST 2010


On 8/4/2010 2:07 PM, AndrewPage wrote:
>
> Hi all,
>
> I am using RCurl to try and download data from a website, but I'm having
> trouble finding out what URL to use.  Here is the site:
>
> http://www.invescopowershares.com/products/holdings.aspx?ticker=PGX
>
> See how in the upper right, above the displayed sheet, there's a link to
> download the data as a .csv file?  When I hit "copy url" and paste into
> getURL in R, it doesn't work.  That's no surprise because there isn't a URL
> in what gets pasted.  I was just wondering if there's any way around this.
>
> Thanks in advance,
>
> Andrew

I looked at the page.  The link you mentioned runs some javascript which 
alters some values in a form and posts that form, the result of which is 
the CSV file.  There is not a simple URL that points to the file.  I 
don't know if RCurl can post forms, but if it can you may be able to 
mimic the form.  The structure of the form starts on line 191 of the 
page source (or search for "aspnetForm") and appropriate values for 
__EVENTTARGET are given in the doPostBack call on line 258.  Some 
understanding of HTML and HTTP may be necessary to know what is going on.

I don't know if this would work or not.  Also, the site has not made it 
easy to directly download the CSV file.  That may be intentional.  The 
Terms & Services of the site may have something to say about doing this 
as well.

--
Brian Diggs
Senior Research Associate, Department of Surgery, Oregon Health & 
Science University



More information about the R-help mailing list