[R] download.file() problems with binary files containing EOF byte in Windows

Scott Sherrill-Mix @he@cott @ending from pennmedicine@upenn@edu
Mon Aug 20 21:42:22 CEST 2018


Hello,
I'm trying to get a package to pass win-builder and have been having a
bit of trouble with Windows R and binary files (in my case a small
.tar.gz used in testing). After a little debugging, I think I've
narrowed it down to download.file() truncating files to the first '1a'
byte (often used for EOF but I think a valid byte inside gzip files)
on downloads from local "file://xxx". I'm trying to figure out if this
is a known "feature" of Windows that I should just avoid or does this
seem like a bug?

For example:

#write a file starting with byte 1a (decimal 26)
writeBin(26:100,'tmp.bin',size=1)
download.file('file://tmp.bin','download.bin')
file.size('tmp.bin')
file.size('download.bin')

On Windows (session info below), I get file sizes of 75 and 0 and on
Linux I get 75 and 75.

As a more real world example, if I download.file() on a .gz file then
a remote download seems to return different size files from a local
download. For example for a gz file from a google hit about gzip
(http://commandlinefanatic.com/cgi-bin/showarticle.cgi?article=art053):

download.file('http://commandlinefanatic.com/gunzip.c.gz','gunzip.c.gz')
download.file('file://gunzip.c.gz','dl.gz')
file.size('gunzip.c.gz')
file.size('dl.gz')

I get a 4704 byte file for the remote download and 360 for the local
download in Windows (versus 4704 and 4704 on Linux). Note that the
361st byte is 1a:

readBin('gunzip.c.gz','raw',361)

The various download.file options don't seem to fix this with the same 360 bytes
for:

download.file('file://gunzip.c.gz','dl.gz',mode='wb')
file.size('dl.gz')
download.file('file://gunzip.c.gz','dl.gz',mode='wb',method='internal')
file.size('dl.gz')

It looks like the 'auto' and 'internal' methods both resolve to the
'wininet' method on Windows and mode is automatically set to 'wb' for
gz files so maybe not surprising those don't change things.

Thanks,
Scott

## Windows sessionInfo():
R version 3.5.1 (2018-07-02)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 8.1 x64 (build 9600)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252


attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base


loaded via a namespace (and not attached):

[1] compiler_3.5.1


## Linux sessionInfo():
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.4



More information about the R-help mailing list