[Rd] readBin differences on Windows and Linux/mac

Uwe Ligges ligges at statistik.uni-dortmund.de
Tue Jan 1 18:36:36 CET 2008


Thank you, Henrik! This saves us a lot of time!

Uwe


Henrik Bengtsson wrote:
> On 01/01/2008, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:
>> Also make sure the problem is not due to downloading a gzip file in
>> text mode, because to the best of my understanding that is platform
>> dependent.  That is, use download.file(..., mode="wb") instead of the
>> default, which is mode="w".  (This is such a common error that I would
>> like to suggest mode="wb" to become the default.)
> 
> Ok, that solves the problem with your example file.   On WinXP/R v2.6.1:
> 
>> library(R.utils)
>> uri <- "ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE1/GSE1_series_matrix.txt.gz"
> 
>> download.file(uri, "test.txt.gz")  # mode="w"
> trying URL 'ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE1/GSE1_series_ma
> trix.txt.gz'
> ftp data connection made, file length 918804 bytes
> opened URL
> downloaded 897 Kb
>> file.info("test.txt.gz")$size
> [1] 922243
> 
>> download.file(uri, "test2.txt.gz")
> ftp data connection made, file length 918804 bytes
> opened URL
> downloaded 897 Kb
>> file.info("test2.txt.gz")$size
> [1] 918804
> 
>> gunzip("test.txt.gz")
> Error in readBin(inn, what = raw(0), size = 1, n = BFR.SIZE) :
>   negative length vectors are not allowed
>> gunzip("test2.txt.gz")
>> file.info("test2.txt")$size
> [1] 3338362
> 
> /H
> 
>> /Henrik
>>
>> On 01/01/2008, Uwe Ligges <ligges at statistik.uni-dortmund.de> wrote:
>>> I see. It is either a bug or something related to the following
>>> paragraph from ?seek:
>>>
>>>       We have found so many errors in the Windows implementation of file
>>>       positioning that users are advised to use it only at their own
>>>       risk, and asked not to waste the R developers' time with bug
>>>       reports on Windows' deficiencies.
>>>
>>> I will investigate more closely when I am back in office end of this week.
>>>
>>> Best,
>>> Uwe
>>>
>>>
>>>
>>>
>>> Sean Davis wrote:
>>>> Sorry, Uwe.  Of course:
>>>>
>>>> Both in relatively recent R-devel (one mac, one windows):
>>>>
>>>> ### gunzip pulled from R.utils to be a simple function
>>>> ### In R.utils, implemented as a method
>>>> gunzip <- function(filename, destname=gsub("[.]gz$", "", filename),
>>>> overwrite=FALSE, remove=TRUE, BFR.SIZE=1e7) {
>>>>   if (filename == destname)
>>>>     stop(sprintf("Argument 'filename' and 'destname' are identical: %s",
>>>> filename));
>>>>   if (!overwrite && file.exists(destname))
>>>>     stop(sprintf("File already exists: %s", destname));
>>>>
>>>>   inn <- gzfile(filename, "rb");
>>>>   on.exit(if (!is.null(inn)) close(inn));
>>>>
>>>>   out <- file(destname, "wb");
>>>>   on.exit(close(out), add=TRUE);
>>>>
>>>>   nbytes <- 0;
>>>>   repeat {
>>>>     bfr <- readBin(inn, what=raw(0), size=1, n=BFR.SIZE);
>>>>     n <- length(bfr);
>>>>     if (n == 0)
>>>>       break;
>>>>     nbytes <- nbytes + n;
>>>>     writeBin(bfr, con=out, size=1);
>>>>   };
>>>>
>>>>   if (remove) {
>>>>     close(inn);
>>>>     inn <- NULL;
>>>>     file.remove(filename);
>>>>   }
>>>>
>>>>   invisible(nbytes);
>>>> }
>>>> download.file('
>>>> ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE1/GSE1_series_matrix.txt.gz','test.txt.gz'
>>>> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/SeriesMatrix/GSE1/GSE1_series_matrix.txt.gz','test.txt.gz'>)
>>>> gunzip('test.txt.gz')
>>>>
>>>> Under windows, this results in the error reported below.  Under mac and
>>>> linux, results in test.txt being created in the current working
>>>> directory.  The actual gunzip function is pretty bare bones, so I don't
>>>> think it complicates matters much to use it in this example.
>>>>
>>>> Sean
>>>>
>>>>
>>>> On Dec 31, 2007 1:24 PM, Uwe Ligges <ligges at statistik.uni-dortmund.de
>>>> <mailto:ligges at statistik.uni-dortmund.de>> wrote:
>>>>
>>>>     Can you give a reproducible example, pelase?
>>>>
>>>>     Uwe Ligges
>>>>
>>>>
>>>>     Sean Davis wrote:
>>>>      > I have been trying to use the gunzip function in the R.utils
>>>>     package.  It
>>>>      > opens a connection to a gzfile, uses readBin to read from that
>>>>     connection,
>>>>      > and then uses writeBin to write out the raw data to a new file.
>>>>      This works
>>>>      > as expected under linux/mac, but under Windows, I get:
>>>>      >
>>>>      > Error in readBin(inn, what= raw(0), size = 1, n=BFR.SIZE)  :
>>>>      >   negative length vectors are not allowed
>>>>      >
>>>>      > A simple traceback shows the error in readBin.  I wouldn't be
>>>>     surprised if
>>>>      > this is a programming issue not located in readBin, but I am
>>>>     confused about
>>>>      > the difference in behaviors on Windows versus mac/linux.  Any
>>>>     insight into
>>>>      > what I can do to remedy the issue and have a cross-platform gunzip()?
>>>>      >
>>>>      > Thanks,
>>>>      > Sean
>>>>      >
>>>>      >       [[alternative HTML version deleted]]
>>>>      >
>>>>      > ______________________________________________
>>>>      > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
>>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>



More information about the R-devel mailing list