[Rd] [External] Possible ALTREP bug

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Thu Jun 17 22:23:28 CEST 2021


Hi Toby,

Just to be more concrete about why to avoid REAL and about the iteration
macros Luke mentioned.

The ITERATE_BY_REGION* macros in include/R_ext/Itermacros.h are built on
top of *_GET_REGION rather than REAL/INTEGER. The key difference here is
that REAL/INTEGER  go down to the Dataptr method of the altclass, will
generally "explode" the ALTREP, which returns a writable data pointer, thus
invalidating the metadata, effectively turning it into a non-altrep form
while Get_region method provides a way to return a contiguous region of the
data in a way that (can and generally does) leave the ALTREPness intact.

We can see the difference by looking at the two respective methods for the
compact sequences ALTREP classes that luke wrote that ship with R:

static void *compact_intseq_Dataptr(SEXP x, Rboolean writeable)
{

    if (COMPACT_SEQ_EXPANDED(x) == R_NilValue) {

/* no need to re-run if expanded data exists */

PROTECT(x);

SEXP info = COMPACT_SEQ_INFO(x);

R_xlen_t n = COMPACT_INTSEQ_INFO_LENGTH(info);

int n1 = COMPACT_INTSEQ_INFO_FIRST(info);

int inc = COMPACT_INTSEQ_INFO_INCR(info);

* SEXP val = allocVector(INTSXP, n);*

int *data = INTEGER(val);


if (inc == 1) {

   /* compact sequences n1 : n2 with n1 <= n2 */

*    for (R_xlen_t i = 0; i < n; i++)*

*       data[i] = (int) (n1 + i);*

}

else if (inc == -1) {

   /* compact sequences n1 : n2 with n1 > n2 */

   for (R_xlen_t i = 0; i < n; i++)

        data[i] = (int) (n1 - i);

}

else

   error("compact sequences with increment %d not supported yet", inc);


* SET_COMPACT_SEQ_EXPANDED(x, val);*
UNPROTECT(1);

    }
    *return DATAPTR(COMPACT_SEQ_EXPANDED(x));*
}

So the above sets the "Expanded" altrep data to a SEXP that holds a normal
SEXP with all the data if it isn't there already, and then returns DATAPTR
of that.

Get_region, on the other hand, looks like this:

static R_xlen_t
compact_intseq_Get_region(SEXP sx, R_xlen_t i, R_xlen_t n,* int *buf*)
{
    /* should not get here if x is already expanded */
    CHECK_NOT_EXPANDED(sx);

    SEXP info = COMPACT_SEQ_INFO(sx);
    R_xlen_t size = COMPACT_INTSEQ_INFO_LENGTH(info);
    R_xlen_t n1 = COMPACT_INTSEQ_INFO_FIRST(info);
    int inc = COMPACT_INTSEQ_INFO_INCR(info);

    R_xlen_t ncopy = size - i > n ? n : size - i;
    if (inc == 1) {


* for (R_xlen_t k = 0; k < ncopy; k++)**    buf[k] = (int) (n1 + k + i);*
return ncopy;

    }
    else if (inc == -1) {

for (R_xlen_t k = 0; k < ncopy; k++)
   buf[k] = (int) (n1 - k - i);
return ncopy;

    }
    else

error("compact sequences with increment %d not supported yet", inc);

}

Here we are passing a buffer to the method, and that buffer gets populated
with the data. No new SEXP, no expanding the vector.

ITERATE_BY_REGION wraps that nicely so you can loop over the whole thing,
but the data is grabbed chunkwise, no huge allocation, no expanding of the
altrep.

One of the examples from summary.c is rsum, which looks like:

static Rboolean rsum(SEXP sx, double *value, Rboolean narm)
{
    LDOUBLE s = 0.0;
    Rboolean updated = FALSE;


*   ITERATE_BY_REGION(sx, x, i, nbatch, double, REAL, {*

*    for (R_xlen_t k = 0; k < nbatch; k++) {*

* if (!narm || !ISNAN(x[k])) {*

*    if(!updated) updated = TRUE;*

*    s += x[k];*

* }*

*    }*
* });*

    if(s > DBL_MAX) *value = R_PosInf;
    else if (s < -DBL_MAX) *value = R_NegInf;
    else *value = (double) s;

    return updated;
}

sx is the SEXP input, x is the chosen name of a variable, declared in the
scope of the macro, that holds the current batch of data that can be used
within the bracketted expression that is the macro's last argument. nbatch
is the name of a variable which will contain how many elements of data the
current batch has in it,  double is the raw data type (used in declaration
of x, in fact), REAL is the R-macro .."type" I guess, used internally. i is
the name chosen for another declared-within-the-macro variable which will
always contain the position in the overall vector corresponding to the
start of the current buffer.

Hope that helps.

Best,
~G

On Thu, Jun 17, 2021 at 11:32 AM <luke-tierney using uiowa.edu> wrote:

> On Thu, 17 Jun 2021, Toby Hocking wrote:
>
> > Oliver, for clarification that section in writing R extensions mentions
> > VECTOR_ELT and REAL but not REAL_ELT nor any other *_ELT functions. I was
> > looking for an explanation of all the *_ELT functions (which are
> apparently
> > new), not just VECTOR_ELT.
> > Thanks Simon that response was very helpful.
> > One more question: are there any circumstances in which one should use
> > REAL_ELT(x,i) rather than REAL(x)[i] or vice versa? Or can they be used
> > interchangeably?
>
> For a single call it is better to use REAL_ELT(x, i) since it doesn't
> force allocating a possibly large object in order to get a pointer to
> its data with REAL(x).  If you are iterating over a whole object you
> may want to get data in chunks. There are iteration macros that
> help. Some examples are in src/main/summary.c.
>
> Best,
>
> luke
>
> >
> > On Wed, Jun 16, 2021 at 4:29 PM Simon Urbanek <
> simon.urbanek using r-project.org>
> > wrote:
> >       The usual quote applies: "use the source, Luke":
> >
> >       $ grep _ELT *.h | sort
> >       Rdefines.h:#define SET_ELEMENT(x, i, val)
> >        SET_VECTOR_ELT(x, i, val)
> >       Rinternals.h:   The function STRING_ELT is used as an argument
> >       to arrayAssign even
> >       Rinternals.h:#define VECTOR_ELT(x,i)    ((SEXP *) DATAPTR(x))[i]
> >       Rinternals.h://SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
> >       Rinternals.h:Rbyte (RAW_ELT)(SEXP x, R_xlen_t i);
> >       Rinternals.h:Rbyte ALTRAW_ELT(SEXP x, R_xlen_t i);
> >       Rinternals.h:Rcomplex (COMPLEX_ELT)(SEXP x, R_xlen_t i);
> >       Rinternals.h:Rcomplex ALTCOMPLEX_ELT(SEXP x, R_xlen_t i);
> >       Rinternals.h:SEXP (STRING_ELT)(SEXP x, R_xlen_t i);
> >       Rinternals.h:SEXP (VECTOR_ELT)(SEXP x, R_xlen_t i);
> >       Rinternals.h:SEXP ALTSTRING_ELT(SEXP, R_xlen_t);
> >       Rinternals.h:SEXP SET_VECTOR_ELT(SEXP x, R_xlen_t i, SEXP v);
> >       Rinternals.h:double (REAL_ELT)(SEXP x, R_xlen_t i);
> >       Rinternals.h:double ALTREAL_ELT(SEXP x, R_xlen_t i);
> >       Rinternals.h:int (INTEGER_ELT)(SEXP x, R_xlen_t i);
> >       Rinternals.h:int (LOGICAL_ELT)(SEXP x, R_xlen_t i);
> >       Rinternals.h:int ALTINTEGER_ELT(SEXP x, R_xlen_t i);
> >       Rinternals.h:int ALTLOGICAL_ELT(SEXP x, R_xlen_t i);
> >       Rinternals.h:void ALTCOMPLEX_SET_ELT(SEXP x, R_xlen_t i,
> >       Rcomplex v);
> >       Rinternals.h:void ALTINTEGER_SET_ELT(SEXP x, R_xlen_t i, int v);
> >       Rinternals.h:void ALTLOGICAL_SET_ELT(SEXP x, R_xlen_t i, int v);
> >       Rinternals.h:void ALTRAW_SET_ELT(SEXP x, R_xlen_t i, Rbyte v);
> >       Rinternals.h:void ALTREAL_SET_ELT(SEXP x, R_xlen_t i, double v);
> >       Rinternals.h:void ALTSTRING_SET_ELT(SEXP, R_xlen_t, SEXP);
> >       Rinternals.h:void SET_INTEGER_ELT(SEXP x, R_xlen_t i, int v);
> >       Rinternals.h:void SET_LOGICAL_ELT(SEXP x, R_xlen_t i, int v);
> >       Rinternals.h:void SET_REAL_ELT(SEXP x, R_xlen_t i, double v);
> >       Rinternals.h:void SET_STRING_ELT(SEXP x, R_xlen_t i, SEXP v);
> >
> >       So the indexing is with R_xlen_t and they return the value
> >       itself as one would expect.
> >
> >       Cheers,
> >       Simon
> >
> >
> >       > On Jun 17, 2021, at 2:22 AM, Toby Hocking <tdhock5 using gmail.com>
> >       wrote:
> >       >
> >       > By the way, where is the documentation for INTEGER_ELT,
> >       REAL_ELT, etc? I
> >       > looked in Writing R Extensions and R Internals but I did not
> >       see any
> >       > mention.
> >       > REAL_ELT is briefly mentioned on
> >       > https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
> >       > Would it be possible to please add some mention of them to
> >       Writing R
> >       > Extensions?
> >       > - how many of these _ELT functions are there? INTEGER, REAL,
> >       ... ?
> >       > - in what version of R were they introduced?
> >       > - I guess input types are always SEXP and int?
> >       > - What are the output types for each?
> >       >
> >       > On Fri, May 28, 2021 at 5:16 PM <luke-tierney using uiowa.edu>
> >       wrote:
> >       >
> >       >> Since the INTEGER_ELT, REAL_ELT, etc, functions are fairly
> >       new it may
> >       >> be possible to check that places where they are used allow
> >       for them to
> >       >> allocate. I have fixed the one that got caught by Gabor's
> >       example, and
> >       >> a rchk run might be able to pick up others if rchk knows
> >       these could
> >       >> allocate. (I may also be forgetting other places where the
> >       _ELt
> >       >> methods are used.)  Fixing all call sites for REAL, INTEGER,
> >       etc, was
> >       >> never realistic so there GC has to be suspended during the
> >       method
> >       >> call, and that is done in the dispatch mechanism.
> >       >>
> >       >> The bigger problem is jumps from inside things that existing
> >       code
> >       >> assumes will not do that. Catching those jumps is possible
> >       but
> >       >> expensive; doing anything sensible if one is caught is really
> >       not
> >       >> possible.
> >       >>
> >       >> Best,
> >       >>
> >       >> luke
> >       >>
> >       >> On Fri, 28 May 2021, Gabriel Becker wrote:
> >       >>
> >       >>> Hi Jim et al,
> >       >>> Just to hopefully add a bit to what Luke already answered,
> >       from what I am
> >       >>> recalling looking back at that bioconductor thread Elt
> >       methods are used
> >       >> in
> >       >>> places where there are hard implicit assumptions that no
> >       garbage
> >       >> collection
> >       >>> will occur (ie they are called on things that aren't
> >       PROTECTed), and
> >       >> beyond
> >       >>> that, in places where there are hard assumptions that no
> >       error (longjmp)
> >       >>> will occur. I could be wrong, but I don't know that
> >       suspending garbage
> >       >>> collection would protect from the second one. Ie it is
> >       possible that an
> >       >>> error *ever* being raised from R code that implements an elt
> >       method could
> >       >>> cause all hell to break loose.
> >       >>>
> >       >>> Luke or Tomas Kalibera would know more.
> >       >>>
> >       >>> I was disappointed that implementing ALTREPs in R code was
> >       not in the
> >       >> cards
> >       >>> (it was in my original proposal back in 2016 to the DSC) but
> >       I trust Luke
> >       >>> that there are important reasons we can't safely allow that.
> >       >>>
> >       >>> Best,
> >       >>> ~G
> >       >>>
> >       >>> On Fri, May 28, 2021 at 8:31 AM Jim Hester
> >       <james.f.hester using gmail.com>
> >       >> wrote:
> >       >>>      From reading the discussion on the Bioconductor issue
> >       tracker it
> >       >>>      seems like
> >       >>>      the reason the GC is not suspended for the non-string
> >       ALTREP Elt
> >       >>>      methods is
> >       >>>      primarily due to performance concerns.
> >       >>>
> >       >>>      If this is the case perhaps an additional flag could be
> >       added to
> >       >>>      the
> >       >>>      `R_set_altrep_*()` functions so ALTREP authors could
> >       indicate if
> >       >>>      GC should
> >       >>>      be halted when that particular method is called for
> >       that
> >       >>>      particular ALTREP
> >       >>>      class.
> >       >>>
> >       >>>      This would avoid the performance hit (other than a
> >       boolean
> >       >>>      check) for the
> >       >>>      standard case when no allocations are expected, but
> >       allow
> >       >>>      authors to
> >       >>>      indicate that R should pause GC if needed for methods
> >       in their
> >       >>>      class.
> >       >>>
> >       >>>      On Fri, May 28, 2021 at 9:42 AM
> >       <luke-tierney using uiowa.edu> wrote:
> >       >>>
> >       >>>> integer and real Elt methods are not expected to allocate.
> >       You
> >       >>>      would
> >       >>>> have to suspend GC to be able to do that. This currently
> >       can't
> >       >>>      be done
> >       >>>> from package code.
> >       >>>>
> >       >>>> Best,
> >       >>>>
> >       >>>> luke
> >       >>>>
> >       >>>> On Fri, 28 May 2021, Gábor Csárdi wrote:
> >       >>>>
> >       >>>>> I have found some weird SEXP corruption behavior with
> >       >>>      ALTREP, which
> >       >>>>> could be a bug. (Or I could be doing something wrong.)
> >       >>>>>
> >       >>>>> I have an integer ALTREP vector that calls back to R from
> >       >>>      the Elt
> >       >>>>> method. When this vector is indexed in a lapply(), its
> >       first
> >       >>>      element
> >       >>>>> gets corrupted. Sometimes it's just a type change to
> >       >>>      logical, but
> >       >>>>> sometimes the corruption causes a crash.
> >       >>>>>
> >       >>>>> I saw this on macOS from R 3.5.3 to 4.2.0. I created a
> >       small
> >       >>>      package
> >       >>>>> that demonstrates this:
> >       >>>      https://github.com/gaborcsardi/redfish
> >       >>>>>
> >       >>>>> The R callback in this package calls
> >       >>>      `loadNamespace("Matrix")`, but
> >       >>>>> the same crash happens for other packages as well, and
> >       >>>      sometimes it
> >       >>>>> also happens if I don't load any packages at all. (But
> >       that
> >       >>>      example
> >       >>>>> was much more complicated, so I went with the package
> >       >>>      loading.)
> >       >>>>>
> >       >>>>> It is somewhat random, and sometimes turning off the JIT
> >       >>>      avoids the
> >       >>>>> crash, but not always.
> >       >>>>>
> >       >>>>> Hopefully I am just doing something wrong in the ALTREP
> >       code
> >       >>>      (see
> >       >>>>>
> >       >>>
> >       https://github.com/gaborcsardi/redfish/blob/main/src/test.c),
> >       >>>      and it
> >       >>>>> is not actually a bug.
> >       >>>>>
> >       >>>>> Thanks,
> >       >>>>> Gabor
> >       >>>>>
> >       >>>>> ______________________________________________
> >       >>>>> R-devel using r-project.org mailing list
> >       >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >       >>>>>
> >       >>>>
> >       >>>> --
> >       >>>> Luke Tierney
> >       >>>> Ralph E. Wareham Professor of Mathematical Sciences
> >       >>>> University of Iowa                  Phone:
> >       >>>       319-335-3386
> >       >>>> Department of Statistics and        Fax:
> >       >>>       319-335-3017
> >       >>>>    Actuarial Science
> >       >>>> 241 Schaeffer Hall                  email:
> >       >>>       luke-tierney using uiowa.edu
> >       >>>> Iowa City, IA 52242                 WWW:
> >       >>>      http://www.stat.uiowa.edu
> >       >>>> ______________________________________________
> >       >>>> R-devel using r-project.org mailing list
> >       >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> >       >>>>
> >       >>>
> >       >>>              [[alternative HTML version deleted]]
> >       >>>
> >       >>>      ______________________________________________
> >       >>>      R-devel using r-project.org mailing list
> >       >>>      https://stat.ethz.ch/mailman/listinfo/r-devel
> >       >>>
> >       >>>
> >       >>>
> >       >>
> >       >> --
> >       >> Luke Tierney
> >       >> Ralph E. Wareham Professor of Mathematical Sciences
> >       >> University of Iowa                  Phone:
> >        319-335-3386
> >       >> Department of Statistics and        Fax:
> >        319-335-3017
> >       >>    Actuarial Science
> >       >> 241 Schaeffer Hall                  email:
> >        luke-tierney using uiowa.edu
> >       >> Iowa City, IA 52242                 WWW:
> >       http://www.stat.uiowa.edu
> >       >> ______________________________________________
> >       >> R-devel using r-project.org mailing list
> >       >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >       >>
> >       >
> >       >       [[alternative HTML version deleted]]
> >       >
> >       > ______________________________________________
> >       > R-devel using r-project.org mailing list
> >       > https://stat.ethz.ch/mailman/listinfo/r-devel
> >       >
> >
> >
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list