[R] RCurl unable to download a particular web page -- what is so special about this web page?

clair.crossupton at googlemail.com clair.crossupton at googlemail.com
Mon Jan 26 14:58:19 CET 2009


Dear R-help,

There seems to be a web page I am unable to download using RCurl. I
don't understand why it won't download:

> library(RCurl)
> my.url <- "http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2"
> getURL(my.url)
[1] ""


Other web pages are ok to download but this is the first time I have
been unable to download a web page using the very nice RCurl package.
While i can download the webpage using the RDCOMClient, i would like
to understand why it doesn't work as above please?




> library(RDCOMClient)
> my.url <- "http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?_r=2"
> ie <- COMCreate("InternetExplorer.Application")
> txt <- list()
> ie$Navigate(my.url)
NULL
> while(ie[["Busy"]]) Sys.sleep(1)
> txt[[my.url]] <- ie[["document"]][["body"]][["innerText"]]
> txt
$`http://www.nytimes.com/2009/01/07/technology/business-computing/
07program.html?_r=2`
[1] "Skip to article Try Electronic Edition Log ...


Many thanks for your time,
C.C

Windows Vista, running with administrator privileges.
> sessionInfo()
R version 2.8.1 (2008-12-22)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
1252;LC_MONETARY=English_United Kingdom.
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base

other attached packages:
[1] RDCOMClient_0.92-0 RCurl_0.94-0

loaded via a namespace (and not attached):
[1] tools_2.8.1




More information about the R-help mailing list