[Rd] problems with truncate() with files > 2Gb under Windows (possibly (PR#7879)

tplate at blackmesacapital.com tplate at blackmesacapital.com
Thu May 19 17:48:35 CEST 2005


This message relates to handling files > 2Gb under Windows.  (I use 2Gb
as shorthand for 2^31-1 -- the largest integer representable in a signed
32 bit integer.)

First issue: truncate() is not able to successfully  truncate files at a
position > 2Gb.  This appears to be due to the use of the Windows
function chsize() in file_truncate() in main/connections.c (chsize()
takes a long int specification of the file size, so we would not expect
it to work for positions > 2Gb).

The Windows API has the function SetEndOfFile(handle) that is
supposed to truncate the file to the current position.  However, this
function does not seem to function correctly when the current position
is beyond 2Gb, so it is not improvement on chsize() (at least under
Windows 2000).  My explorations with Windows 2000 SP2 and XP Prof SP1 
indicate that SetEndOfFile() DOES successfully truncate files > 2Gb to 
sizes < 2Gb, but cannot truncate the same file to a position beyond 2Gb. 
  So I have no suggestions on how to get this to work.  Probably, the 
best thing to do would be to stop with in error in the appropriate 
situations.

Second issue: although the R function seek() can take a seek position
specified as a double, which allows it to seek to a position beyond 2Gb,
the return value from seek() appears to be a 32-bit signed integer, 
resulting in strange (incorrect) return values from seek(), though 
otherwise not affecting correct operation.

Inspecting the code, I wonder whether the lines

#if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
     off_t pos = f_tell(fp);
#else
     long pos = f_tell(fp);
#endif

in the definition of file_seek() in main/connections.c should be more
along the lines of the code defining struct fileconn in
include/Rconnections.h:

#if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
     off_t rpos, wpos;
#else
#ifdef Win32
     off64_t rpos, wpos;
#else
     long rpos, wpos;
#endif
#endif

I compiled and tested a version of R devel 2.2.0 with the appropriate
simple change to file_seek() in main/connections.c, and with it, seek()
correctly returned file positions beyond 2Gb.  However,  I don't know
the purpose of the #define __USE_LARGEFILE (and I couldn't find any info
about googling about it on r-project.org), so I'm hesitant to offer a
patch.  Here's the new block of code I used in main/connections.c that 
worked ok under Windows :

#if defined(HAVE_OFF_T) && defined(__USE_LARGEFILE)
     off_t pos = f_tell(fp);
#else
#ifdef Win32
     off64_t pos = f_tell(fp);
#else
     long pos = f_tell(fp);
#endif
#endif

I'll be happy to submit a patch that addresses these issues, if someone 
will explain the usage and purpose of __USE_LARGEFILE.

The following transcript, which illustrates both issues (without my 
mods), was created from an installation based on the precompiled version 
of R for Windows. (rw2010.exe).

-- Tony Plate

> options(digits=15)
>
> # can truncate a short file from 8 bytes to 4 bytes
> # first create a file with 8 bytes
> f <- file("tmp1.txt", "wb")
> writeLines(c("abc", "def"), f)
> close(f)
> # check length then truncate to 4 bytes
> f <- file("tmp1.txt", "r+b")
> seek(f, 0, "end")
[1] 0
> seek(f, NA)
[1] 8
> seek(f, 4)
[1] 8
> truncate(f)
NULL
> seek(f, 0, "end")
[1] 4
> seek(f, NA)
[1] 4
> close(f)
> # can truncate a long file from 2000000008 bytes to 2000000004 bytes
> # first create a file with 2000000008 bytes (slightly < 2^31)
> f <- file("tmp1.txt", "wb")
> seek(f, 2000000000)
[1] 0
> writeLines(c("abc", "def"), f)
> close(f)
> f <- file("tmp1.txt", "r+b")
> seek(f, 0, "end")
[1] 0
> seek(f, NA)
[1] 2000000008
> seek(f, 2000000004)
[1] 2000000008
> truncate(f)
NULL
> seek(f, 0, "end")
[1] 2000000004
> seek(f, NA)
[1] 2000000004
> close(f)
> # cannot truncate a long file from 2200000008 bytes to 2200000004 bytes
> # first create a file with 2200000008 bytes (slightly > 2^31)
> f <- file("tmp1.txt", "wb")
> seek(f, 2200000000)
[1] 0
> writeLines(c("abc", "def"), f)
> close(f)
> f <- file("tmp1.txt", "r+b")
> seek(f, 0, "end")
[1] 0
> seek(f, NA) # bad reported value of the current position of "2200000008"
[1] -2094967288
> 2200000008 - 2^32
[1] -2094967288
> seek(f, 2200000004)
[1] -2094967288
> truncate(f) # doesn't work!
NULL
> seek(f, 0, "end")
[1] -2094967288
> # see if we successfully truncated... (no -- same length as before
> # can also verify this by watching file size with 'ls -l')
> seek(f, NA) # file is same size as before the attempted truncation
[1] -2094967288
> close(f)
> version
          _
platform i386-pc-mingw32
arch     i386
os       mingw32
system   i386, mingw32
status
major    2
minor    1.0
year     2005
month    04
day      18
language R
>



More information about the R-devel mailing list