[Rd] Operations with long altrep vectors cause segfaults on Windows

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue Sep 8 10:52:37 CEST 2020


>>>>> Martin Maechler 
>>>>>     on Tue, 8 Sep 2020 10:40:24 +0200 writes:

>>>>> Hugh Parsonage 
>>>>>     on Tue, 8 Sep 2020 18:08:11 +1000 writes:

    >> I can only reproduce on Windows, but reliably (both 4.0.0 and 4.0.2):

    >> $> R --vanilla
    >> x <- c(0L, -2e9:2e9)

    >> # > Segmentation fault

    >> Tried to reproduce on Linux but the above worked as expected. Not an
    >> issue merely with the length of the vector; for example, x <-
    >> rep_len(1:10, 1e10) works, though the altrep vector must be long to
    >> reproduce:

    >> x <- c(0L, -1e9:1e9)  #ok

    >> Segmentation faults occur with the following too:

    >> x <- (-2e9:2e9) + 1L

    > Your operation would "need" (not in theory, but in practice)
    > to go from altrep to regular vectors.
    > I guess the segfault occurs because of something like this :

    > R asks Windows to hand it a huge amount of memory and Windows replies
    > "ok, here is the memory pointer"
    > and then R tries to write to there, but illegally (because
    > Windows should have told R that it does not really have enough
    > memory for that ..). 
 
    > I cannot reproduce the segmentation fault .. but I can confirm
    > there is a bug there that shows for me on Windows but not on
    > Linux:

    > "My" Windows is on a terminalserver not with too many GB of memory
    > (but then in a version of Windows that recognizes that it cannot
    > get so much memory):

    > ------------------------- Here some transcript (thanks to
    > using Emacs w/ ESS also on Windows) ------------------

    > R Under development (unstable) (2020-08-24 r79074) -- "Unsuffered Consequences"
    > Copyright (C) 2020 The R Foundation for Statistical Computing
    > Platform: x86_64-w64-mingw32/x64 (64-bit)

    > R ist freie Software und kommt OHNE JEGLICHE GARANTIE.
    > Sie sind eingeladen, es unter bestimmten Bedingungen weiter zu verbreiten.
    > Tippen Sie 'license()' or 'licence()' für Details dazu.

    > R ist ein Gemeinschaftsprojekt mit vielen Beitragenden.
    > Tippen Sie 'contributors()' für mehr Information und 'citation()',
    > um zu erfahren, wie R oder R packages in Publikationen zitiert werden können.

    > Tippen Sie 'demo()' für einige Demos, 'help()' für on-line Hilfe, oder
    > 'help.start()' für eine HTML Browserschnittstelle zur Hilfe.
    > Tippen Sie 'q()', um R zu verlassen.

    >> x <- (-2e9:2e9) + 1L
    > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
    >> y <- c(0L, -2e9:2e9)
    > Fehler: kann Vektor der Größe 14.9 GB nicht allozieren
    >> Sys.setenv(LANGUAGE="en")
    >> y <- c(0L, -2e9:2e9)
    > Error: cannot allocate vector of size 14.9 Gb
    >> y <- -1e9:4e9
    >> .Internal(inspect(y))
    > @0x00000000195a6808 14 REALSXP g0c0 [REF(65535)]  -1000000000 : -294967296 (compact)
    >> .Machine$integer.max / 1e9
    > [1] 2.147484
    >> y <- -1e6:2.2e9
    >> .Internal(inspect(y))
    > @0x000000000a11a5d8 14 REALSXP g0c0 [REF(65535)]  -1000000 : -2094967296 (compact)
    >> y <- -1e6:2e9
    >> .Internal(inspect(y))
    > @0x000000000a13adf0 13 INTSXP g0c0 [REF(65535)]  -1000000 : 2000000000 (compact)
    >> 
    > ------------------------- end of transcript -----------------------------------

    > So indeed, no seg.fault, R notices that it can't get 15 GB of
    > memory.

    > But the bug is bad news:  We have *silent* integer overflow happening
    > according to what  .Internal(inspect(y)) shows...

    > .... less bad new: Probably the bug is only in the 'internal inspect' code
    > where a format specifier is used in C's printf() that does not work
    > correctly on Windows, at least the way it is currently compiled ..


    > On (64-bit) Linux, I get

    >> y <- -1e9:4e9 ; .Internal(inspect(y))
    > @7d86388 14 REALSXP g0c0 [REF(65535)]  -1000000000 : 4000000000 (compact)

    >> y <- c(0L, y)
    > Error: cannot allocate vector of size 37.3 Gb

    > which seems much better ... until I do find a bug, may again
    > only in the C code underlying .Internal(inspect(.)) :

    >> y <- -1e9:2e9 ; .Internal(inspect(y))
    > @7d86ac0 13 INTSXP g0c0 [REF(65535)] Error: long vectors not supported yet: ../../../R/src/main/altclasses.c:139
    >> 

Indeed, the purported "integer overflow" (above) does not
happen.
It is "only" a  'printf' related bug inside .Internal(inspect(.)) on Windows.

*interestingly*, the above bug I've noticed on (64-bit) Linux
does *not* show on Windows (64-bit), at least not for that case:

On Windows, things are fine as long as they remain (compacted
aka 'ALTREP') INTSXP:

  > y <- -1e3:2e9 ;.Internal(inspect(y))
  @0x000000000a285648 13 INTSXP g0c0 [REF(65535)]  -1000 : 2000000000 (compact)
  > y <- -1e3:2.1e9 ;.Internal(inspect(y))
  @0x0000000019925930 13 INTSXP g0c0 [REF(65535)]  -1000 : 2100000000 (compact)

and here, y is correct, just the printing from
.Internal(inspect(y)) is bugous (probably prints the double as an integer):

  > y <- -1e3:2.2e9 ; .Internal(inspect(y))
  @0x00000000195c0178 14 REALSXP g0c0 [REF(65535)]  -1000 : -2094967296 (compact)
  > length(y)
  [1] 2200001001
  > tail(y)
  [1] 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09 2.2e+09
  > tail(y) - 2.2e9
  [1] -5 -4 -3 -2 -1  0
  >



More information about the R-devel mailing list