[R] Opening or activating a URL to access data, alternative to browseURL

Ryan Utz utz.ryan at gmail.com
Tue Oct 11 13:59:50 CEST 2016


Bob/Duncan,

Thanks for writing. I think some of the things Bob mentioned might work,
but I'm still not quite getting there. Below is the example I'm working
with:

#1
browseURL('http://pick18.discoverlife.org/mp/20m?plot=
2&kind=Hypoprepia+fucosa&site=33.9+-83.3&date1=2011,2012,
2013&flags=build_txt:')
# This opens the URL and creates a link to machine-readable data on the
page, which I can then download by simply doing this:

#2
read.delim('http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-83.3_
2011,2012,2013.txt')
#This is what I need to read in terms of data, but this URL only exists if
the URL ran above is activated first

So, for example, try running line #2 without the first line- it won't work.
Next run #1 then #2- works fine.

See what I mean?


On Thu, Sep 29, 2016 at 5:09 PM, Bob Rudis <bob at rud.is> wrote:

> The rvest/httr/curl trio can do the cookie management pretty well. Make
> the initial connection via rvest::html_session() and then hopefully be able
> to use other rvest function calls, but curl and httr calls will use the
> cached in-memory handle info seamlessly. You'd need to store and retrieve
> cookies if you need them preserved between R sessions.
>
> Failing the above and assuming this would not need to be lightning fast,
> use the phantomjs or firefox web driver (either with RSelenium or some new
> stuff rOpenSci is cooking up) which will then do what browsers do best and
> maintain all this state for you. You can still slurp the page contents up
> with xml2::read_html() and use the super handy processing idioms in the
> scraping tidyverse (it needs it's own name).
>
> A concrete example (assuming the URLs aren't sensitive) would enable me or
> someone else to mock up something for you.
>
>
> On Thu, Sep 29, 2016 at 4:59 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
> wrote:
>
>> On 29/09/2016 3:29 PM, Ryan Utz wrote:
>>
>>> Hi all,
>>>
>>> I've got a situation that involves activating a URL so that a link to
>>> some
>>> data becomes available for download. I can easily use 'browseURL' to do
>>> so,
>>> but I'm hoping to make this batch-process-able, and I would prefer to not
>>> have 100s of browser windows open when I go to download multiple data
>>> sets.
>>>
>>> Here's the example:
>>>
>>> #1
>>> browseURL('
>>> http://pick18.discoverlife.org/mp/20m?plot=2&kind=Hypoprepia
>>> +fucosa&site=33.9+-83.3&date1=2011,2012,2013&flags=build_txt:
>>> ')
>>> # This opens the URL and creates a link to machine-readable data on the
>>> page, which I can then download by simply doing this:
>>>
>>> #2
>>> read.delim('
>>> http://pick18.discoverlife.org/tmp/Hypoprepia_fucosa_33.9_-8
>>> 3.3_2011,2012,2013.txt
>>> ')
>>>
>>> However, I can only get the second line above to work if the thing in
>>> line
>>> #1 has been opened in a browser already. Is there any way to allow me to
>>> either 1) close the browser after it's been opened or 2) execute the line
>>> #2 above without having to open a browser? We have hundreds of species
>>> that
>>> you can see after the '&kind=' bit of the URL, so I'm trying to keep the
>>> browsing situation sane.
>>>
>>> Thanks!
>>> R
>>>
>>>
>> You'll need to figure out what happens when you open the first page. Does
>> it set a cookie?  Does it record your IP address?  Does it just build the
>> file but record nothing about you?
>>
>> If it's one of the simpler versions, you can just read the first page,
>> wait a bit, then read the second one.
>>
>> If you need to manage cookies, you'll need something more complicated. I
>> don't know the easiest way to do that.
>>
>> Duncan Murdoch
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>


-- 

Ryan Utz, Ph.D.
Assistant professor of water resources
*chatham**UNIVERSITY*
Home/Cell: (724) 272-7769

	[[alternative HTML version deleted]]



More information about the R-help mailing list