[R] R and HTTP get 'has file changed'

Seth Falcon sfalcon at fhcrc.org
Fri Jul 13 04:46:59 CEST 2007


Hi Dirk,

Dirk Eddelbuettel <edd at debian.org> writes:
> Is there a way, maybe using Duncan TL's RCurl, to efficiently test whether
> an URL such as 
>
> 	http://$CRAN/src/contrib/ 
>
> has changed?  I.e. one way is via a test of a page in that directory as per
> (sorry about the long line, and this would be on Linux with links and awk
> installed)
>
>    > strptime(system("links -width 160 -dump http://cran.r-project.org/src/contrib/ | awk '/PACKAGES.html/ {print $3,$4}\'", intern=TRUE), "%d-%b-%Y %H:%M")
>    [1] "2007-07-12 18:16:00"
>    > 
>
> and one can then compare the POSIXt with a cached value --- but requesting
> the header would presumably be more efficient.
>
> Is there are way to request the 'has changed' part of the http 1.1 spe
> directly in R?

Here's a way to use RCurl obtain HTTP headers:

        h <- basicTextGatherer()
        junk <- getURI(url, writeheader=h$update, header=TRUE, nobody=TRUE)
        h <- h$value()

If you want to check many URLs, I think you will find the following
much faster as opposed to looping the above:

        h <- multiTextGatherer(urls)
        junk <- getURIAsynchronous(urls, write=h, header=TRUE, nobody=TRUE)
        yourInfo <- sapply(h, function(x) something(x$value()))

I've used this in the pkgDepTools package to retrieve package download
sizes.

Cheers,

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org



More information about the R-help mailing list