[Rd] download.file does not process gz files correctly (truncates them?)

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Fri May 4 09:06:15 CEST 2018


>>>>> Tomas Kalibera <tomas.kalibera at gmail.com>
>>>>>     on Fri, 4 May 2018 08:34:03 +0200 writes:

    > On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:
    >> Also, as mentioned in my
    >> https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html,
    >> when not specifying the mode argument, the default on
    >> Windows is mode = "w" *except* for certain,
    >> case-sensitive, filename extensions:
    >> 
    >> if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$", url)))
    >>      mode <- "wb"
    >> 
    >> Just like the need for mode = "wb" on Windows, the above
    >> special-file-extension-hack is only happening on Windows,
    >> and is only documented in ?download.file if you're on
    >> Windows; so someone who's on Linux/macOS trying to help
    >> someone on Windows may not be aware of this. This adds to
    >> even more confusions, e.g. "works for me".

    > If we were designing the API today, it would probably make
    > more sense not to convert any line endings by
    > default. Today's editors _usually_ can cope with different
    > line endings and it is probably easier to detect that a
    > text file has incorrect line endings rather than detecting
    > that a binary file has been corrupted by an attempt to
    > convert line endings.  But whether to change existing,
    > documented behavior is a different question. In order to
    > help users and programmers who do not read the
    > documentation carefully we would create problems for users
    > and programmers who do. 

    > The current heuristic/hack is in
    > line with the compatibility approach: it detects files
    > that are obviously binary, so it changes the default
    > behavior only for cases when it would obviously cause
    > damage.

    > Tomas


Thank you, Tomas;  I was about to say something similar but
probably less convincingly. 

There's one thing I strongly agree with Henrik:  The
only-on-Windows documented Windows behavior should be documented
on all platforms.

I'll update the help page,

and will also add the .rds extension to the above list
[ --- yes, we all should use saveRDS() and readRDS() whenever
      sensible in favor of save() and load() ]

Martin


    >> /Henrik
    >> 
    >> On Thu, May 3, 2018 at 7:27 AM, Joris Meys
    >> <jorismeys at gmail.com> wrote:
    >>> Thank you Henrik and Martin for explaining what was
    >>> going on. Very insightful!
    >>> 
    >>> On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms
    >>> <jeroenooms at gmail.com> wrote:
    >>>> On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson
    >>>> <henrik.bengtsson at gmail.com> wrote:
    >>>>> Use mode="wb" when you download the file. See
    >>>>> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.
    >>>>> 
    >>>>> R core, and others, is there a good argument for why
    >>>>> we are not making this the default download mode? It
    >>>>> seems like a such a simple fix to such a common
    >>>>> "mistake".
    >>>> I'd like to second this feature request. This default
    >>>> behaviour is unexpected and often leads to r scripts
    >>>> that were written on mac/linux, to produce corrupted
    >>>> files on windows, checksum mismatches, etc.
    >>>> 
    >>>> Even for text files, the default should be to download
    >>>> the file as-is.  Trying to "fix" line-endings should be
    >>>> opt-in, never the default.  Downloading a file via a
    >>>> browser or ftp client on windows also doesn't change
    >>>> the file, why should R?
    >>> 
    >>> I third the feature request.
    >>> 
    >>>> 
    >>>> 
    >>>> On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch
    >>>> <murdoch.duncan at gmail.com> wrote:
    >>>>> Many downloads are text files (HTML, CSV, etc.), and
    >>>>> if those are downloaded in binary, a Windows user
    >>>>> might end up with a file that Notepad can't handle,
    >>>>> because it would have Unix-style line endings.
    >>>> True but I don't think this is relevant. The same holds
    >>>> e.g. for the R files in source packages, which also
    >>>> have unix line endings. Most Windows users will use an
    >>>> actual editor that understands both types of line
    >>>> endings, or can convert between the two.
    >>>> 
    >>>> Downloading-file should do just that.
    >>> 
    >>> Again, I agree. In my (limited) experience the only
    >>> program that fails to properly display \n as a line
    >>> ending, is Notepad. But it can still open the file
    >>> regardless. If line ending conflicts cause bugs, it's
    >>> almost always a unix-like OS struggling with
    >>> Windows-style endings. I have yet to meet the first one
    >>> the other way around.
    >>> 
    >>> Cheers Joris
    >>> 
    >>> 
    >>> --
    >>> Joris Meys Statistical consultant
    >>> 
    >>> Department of Data Analysis and Mathematical Modelling
    >>> Ghent University Coupure Links 653, B-9000 Gent
    >>> (Belgium)
    >>> 
    >>> -----------
    >>> Biowiskundedagen 2017-2018
    >>> http://www.biowiskundedagen.ugent.be/
    >>> 
    >>> -------------------------------
    >>> Disclaimer :
    >>> http://helpdesk.ugent.be/e-maildisclaimer.php
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel




More information about the R-devel mailing list