[Rd] read.table() fails with https in R 3.6 but not in R 3.5

Ralf Stubner r@||@@tubner @end|ng |rom d@q@n@@com
Mon May 6 11:12:25 CEST 2019


On 04.05.19 19:04, Stephen Berman wrote:
> In versions of R prior to 3.6.0 the following invocation succeeds,
> returning the data frame shown:
> 
>> read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text", header=TRUE)
>    Dekade   Anzahl
> 1    1900 11467254
> 2    1910 13023370
> 3    1920 13434601
> 4    1930 13296355
> 5    1940 12121250
> 6    1950 13191131
> 7    1960 10587420
> 8    1970 10944129
> 9    1980 11279439
> 10   1990 12052652
> 
> But in version 3.6.0 it fails:
> 
>> read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text", header=TRUE)
> Error in file(file, "rt") :
>   cannot open the connection to 'https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text'
> In addition: Warning message:
> In file(file, "rt") :
>   cannot open URL 'https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text': HTTP status was '403 Forbidden'

I can reproduce the behavior on Debian using the CRAN supplied package
for R 3.6.0. Trying to read the page with 'curl' produces also a 403
error plus some HTML text (in German) explaining that I am treated as a
'robot' due to the supplied User-Agent (here: curl/7.52.1). One
suggested solution is to adjust that value which does solve the issue:

 > options(HTTPUserAgent='mozilla')
>
read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text",
header=TRUE)
   Dekade   Anzahl
1    1900 11467254
2    1910 13023370
3    1920 13434601
4    1930 13296355
5    1940 12121250
6    1950 13191131
7    1960 10587420
8    1970 10944129
9    1980 11279439
10   1990 12052652

Other solutions are to simulate a login or to get in touch with DWDS
directly.

Greetings
Ralf

-- 
Ralf Stubner
Senior Software Engineer / Trainer

daqana GmbH
Dortustraße 48
14467 Potsdam

T: +49 331 23 61 93 11
F: +49 331 23 61 93 90
M: +49 162 20 91 196
Mail: ralf.stubner using daqana.com

Sitz: Potsdam
Register: AG Potsdam HRB 27966
Ust.-IdNr.: DE300072622
Geschäftsführer: Dr.-Ing. Stefan Knirsch, Prof. Dr. Dr. Karl-Kuno Kunze


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20190506/534b1d42/attachment.sig>


More information about the R-devel mailing list