[Rd] Issue with seek() on gzipped connections in R-devel

Jon Clayden jon.clayden at gmail.com
Fri Sep 23 18:13:36 CEST 2011


Thanks for the replies. I take the point, although it does seem like a
substantial regression (on non-Windows platforms).

I like to keep the external dependencies of my packages minimal, but I
will look into the mmap package - thanks, Jeff, for the tip.

Aside from that, though, what is the alternative to using seek? If I
want to read something at (original, uncompressed) byte offset 352, as
here, do I have to read and discard everything that comes before it
first? That seems inelegant at best...

Regards,
Jon


On 23 September 2011 16:54, Jeffrey Ryan <jeffrey.ryan at lemnica.com> wrote:
> seek() in general is a bad idea IMO if you are writing cross-platform code.
>
> ?seek
>
> Warning:
>
>     Use of ‘seek’ on Windows is discouraged.  We have found so many
>     errors in the Windows implementation of file positioning that
>     users are advised to use it only at their own risk, and asked not
>     to waste the R developers' time with bug reports on Windows'
>     deficiencies.
>
> Aside from making me laugh, the above highlights the core reason to not use IMO.
>
> For not zipped files, you can try the mmap package.  ?mmap and ?types
> are good starting points.  Allows for accessing binary data on disk
> with very simple R-like semantics, and is very fast.  Not as fast as a
> sequential read... but fast.  At present this is 'little endian' only
> though, but that describes most of the world today.
>
> Best,
> Jeff
>
> On Fri, Sep 23, 2011 at 8:58 AM, Jon Clayden <jon.clayden at gmail.com> wrote:
>> Dear all,
>>
>> In R-devel (2011-09-23 r57050), I'm running into a serious problem
>> with seek()ing on connections opened with gzfile(). A warning is
>> generated and the file position does not seek to the requested
>> location. It doesn't seem to occur all the time - I tried to create a
>> small example file to illustrate it, but the problem didn't occur.
>> However, it can be seen with a file I use for testing my packages,
>> which is available through the URL
>> <https://github.com/jonclayden/tractor/blob/master/tests/data/nifti/maskedb0_lia.nii.gz?raw=true>:
>>
>>> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb")
>>> seek(con, 352)
>> [1] 0
>> Warning message:
>> In seek.connection(con, 352) :
>>  seek on a gzfile connection returned an internal error
>>> seek(con, NA)
>> [1] 190
>>
>> The same commands with the same file work as expected in R 2.13.1, and
>> have worked over many previous versions of R.
>>
>>> con <- gzfile("~/Downloads/maskedb0_lia.nii.gz","rb")
>>> seek(con, 352)
>> [1] 0
>>> seek(con, NA)
>> [1] 352
>>
>> My sessionInfo() output is:
>>
>> R Under development (unstable) (2011-09-23 r57050)
>> Platform: x86_64-apple-darwin11.1.0 (64-bit)
>>
>> locale:
>> [1] en_GB.UTF-8/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>>
>> attached base packages:
>> [1] splines   stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>> [1] tractor.nt_2.0.1      tractor.session_2.0.3 tractor.utils_2.0.0
>> [4] tractor.base_2.0.3    reportr_0.2.0
>>
>> This seems to occur whether or not R is compiled with
>> "--with-system-zlib". I see some zlib-related changes mentioned in the
>> NEWS, but I don't see any indication that this is expected. Could
>> anyone shed any light on it, please?
>>
>> Thanks and all the best,
>> Jon
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Jeffrey Ryan
> jeffrey.ryan at lemnica.com
>
> www.lemnica.com
> www.esotericR.com
>



More information about the R-devel mailing list