[Rd] What is the best way to loop over an ALTREP vector?

Bob Rudis bob @end|ng |rom rud@|@
Tue Sep 24 03:12:53 CEST 2019


Not sure if you're using just C++ or Rcpp for C++ access but https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on using C++ w/ALTREP.

> On Sep 23, 2019, at 3:17 PM, Wang Jiefei <szwjf08 using gmail.com> wrote:
> 
> Sorry for post a lot of things, for the first part of code, I copied my C++
> iter macro by mistake(and you can see an explicit type casting). Here is
> the macro definition from R_exts/Itermacros.h
> 
> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
> 
>                             strt, nfull, expr) do {         \
> 
> *       const** etype *px = DATAPTR_OR_NULL(sx);           *             \
> 
>       if (px != NULL) {                                      \
> 
>           R_xlen_t __ibr_n__ = strt + nfull;                        \
> 
>           R_xlen_t nb = __ibr_n__;                                  \
> 
>           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
> 
>              expr                                            \
> 
>            }                                                 \
> 
>       }                                                      \
> 
>       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
> \
> 
>                                  strt, nfull, expr);        \
> 
>    } while (0)
> 
> 
> Best,
> 
> Jiefei
> 
> On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
> 
>> Hi Gabriel,
>> 
>> I have tried the macro and found a small issue, it seems like the macro is
>> written in C and does an implicit type conversion(const void * to const int
>> *), see below. While it is allowed in C, C++ seems not happy with it. Is it
>> possible to add an explicit type casting so that it can be compatible with
>> both language?
>> 
>> 
>> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
>> 
>>                             strt, nfull, expr) do {         \
>> 
>>       *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
>> \
>> 
>>       if (px != NULL) {                                      \
>> 
>>           R_xlen_t __ibr_n__ = strt + nfull;                        \
>> 
>>           R_xlen_t nb = __ibr_n__;                                  \
>> 
>>           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
>> 
>>              expr                                            \
>> 
>>            }                                                 \
>> 
>>       }                                                      \
>> 
>>       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
>> vtype,       \
>> 
>>                                   strt, nfull, expr);        \
>> 
>>    } while (0)
>> 
>> 
>>  Also, I notice that the element type(etype) and vector type(vtype) has
>> to be specified in the macro. Since the SEXP is the first argument in the
>> macro, it seems redundant to define etype and vtype for they have to match
>> the type of the SEXP. I'm wondering if this is intentional? Will there be a
>> type-free macro in R in the future? Here is a simple type-free macro I'm
>> using.
>> 
>> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
>> 
>> switch(TYPEOF(sx)){\
>> 
>> case INTSXP:\
>> 
>>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
>> 
>>       break; \
>> 
>> case REALSXP:\
>> 
>>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
>> 
>>       break; \
>> 
>> case LGLSXP:\
>> 
>>       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
>> 
>>       break; \
>> 
>> default:\
>> 
>>       Rf_error("Unknow data type\n"); \
>> 
>>       break; \
>> 
>> }
>> 
>> 
>> 
>> // [[Rcpp::export]]
>> 
>> double sillysum(SEXP x) {
>> 
>>       double s = 0.0;
>> 
>>       type_free_iter(x, ptr, ind, nbatch,
>> 
>>              {
>> 
>>                     for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
>> 
>>              });
>> 
>>              return s;
>> 
>> }
>> 
>> 
>> 
>> 
>> Best,
>> 
>> Jiefei
>> 
>> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
>> 
>>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
>>> to see that there are lots of changes in ALTREP in R devel version. I
>>> really appreciate your help!
>>> 
>>> Best,
>>> Jiefei
>>> 
>>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker using gmail.com>
>>> wrote:
>>> 
>>>> Jiefei,
>>>> 
>>>> I've been meaning to write up something about this so hopefully this
>>>> will be an impetus for me to actually do that, but until then, responses
>>>> inline.
>>>> 
>>>> 
>>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
>>>> 
>>>>> Hi devel team,
>>>>> 
>>>>> I'm working on C/C++ level ALTREP compatibility for a package. The
>>>>> package
>>>>> previously used pointers to access the data of a SEXP, so it would not
>>>>> work
>>>>> for some ALTREP objects which do not have a pointer. I plan to rewrite
>>>>> the
>>>>> code and use functions like get_elt, get_region, and get_subset to
>>>>> access
>>>>> the values of a vector, so I have a few questions for ALTREP:
>>>>> 
>>>>> 1. Since an ALTREP do not have to define all of the above
>>>>> functions(element, region, subset), is there any way to check which
>>>>> function has been defined for an ALTREP class? I did a search on
>>>>> RInternal.h and altrep.c but did not find a solution for it. If not,
>>>>> will
>>>>> it be added in the future?
>>>>> 
>>>> 
>>>> Element and region are guaranteed to always be defined and work (for
>>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't
>>>> have region for STRSXP or VECSXP, I believe). If the altrep class does not
>>>> provide them then default methods will be used, which may be inefficient in
>>>> some cases but will work. Subset is currently a forward looking stub, but
>>>> once implimented, that will also be guaranteed to work for all valid ALTREP
>>>> classes.
>>>> 
>>>> 
>>>>> 
>>>>> 2. Given the diversity of ALTREP classes, what is the best way to loop
>>>>> over
>>>>> an ALTREP object? I hope there can be an all-in-one function which can
>>>>> get
>>>>> the values from a vector as long as at least one of the above functions
>>>>> has
>>>>> been defined, so package developers would not be bothered by tons of
>>>>> `if-else` statement if they want their package to work with ALTREP.
>>>>> Since
>>>>> it seems like there is no such function exist, what could be the best
>>>>> way
>>>>> to do the loop under the current R version?
>>>>> 
>>>> 
>>>> The best way to loop over all SEXPs, which supports both altrep and
>>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a
>>>> number of released versions, at least since 3.5.0 I think) and the much
>>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
>>>> R_exts/Itermacros.h
>>>> 
>>>> The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL
>>>> are as follows (ITERATE_BY_REGION is the same except no strt, and nfull).
>>>> 
>>>> 
>>>>   - sx - C level variable name of the SEXP to iterate over
>>>>   - px - variable name to use for the pointer populated with data from
>>>>   a region of sx
>>>>   - idx - variable name to use for the "outer", batch counter in the
>>>>   for loop. This will contain the 0-indexed start position of the batch
>>>>   you're currently processing
>>>>   - nb - variable name to use for the current batch size. This will
>>>>   always either be GET_REGION_BUFFSIZE (512), or the number of elements
>>>>   remaining in the vector, whichever is smaller
>>>>   - etype - element (C) type, e.g., int, double, of the data
>>>>   - vtype - vector (access API) type, e.g, INTEGER, REAL
>>>>   - strt - the 0-indexed position in the vector to start iterating
>>>>   - nfull - the total number oif elements to iterate over from the
>>>>   vector
>>>>   - expr - the code to process a single batch (Which will do things to
>>>>   px, typically)
>>>> 
>>>> 
>>>> So code to perform badly implemented not good idea summing of REALSXP
>>>> data might look like
>>>> 
>>>> double sillysum(SEXP x) {
>>>> 
>>>>    double s = 0.0;
>>>> 
>>>>    ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
>>>>        {
>>>> 
>>>>            for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
>>>>        })
>>>> 
>>>>     return s;
>>>> }
>>>> 
>>>> For meatier examples of ITERATE_BY_REGION's use in practice you can grep
>>>> the R sources. I know it is used in the implementations of the various
>>>> C-level summaries (summary.c), print and formatting functions, and anyNA.
>>>> 
>>>> Some things to remember
>>>> 
>>>>   - If you have an inner loop like the one above, your total position
>>>>   in the original vector is ind + i
>>>>   - ITERATE_BY_REGION always processes the whole vector, if you need
>>>>   to only do part of it yo'll either need custom breaking for both inner and
>>>>   outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
>>>>   - Don't use the variants ending in 0, all they do is skip over
>>>>   things that are a good idea in the case of non-altreps (and some very
>>>>   specific altreps).
>>>> 
>>>> Hope that helps.
>>>> 
>>>> Best,
>>>> ~G
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> Best,
>>>>> Jiefei
>>>>> 
>>>>>        [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list