[Rd] What is the best way to loop over an ALTREP vector?

Wang Jiefei @zwj|08 @end|ng |rom gm@||@com
Mon Sep 23 21:17:58 CEST 2019


Sorry for post a lot of things, for the first part of code, I copied my C++
iter macro by mistake(and you can see an explicit type casting). Here is
the macro definition from R_exts/Itermacros.h

#define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \

                             strt, nfull, expr) do {         \

*       const** etype *px = DATAPTR_OR_NULL(sx);           *             \

       if (px != NULL) {                                      \

           R_xlen_t __ibr_n__ = strt + nfull;                        \

           R_xlen_t nb = __ibr_n__;                                  \

           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \

              expr                                            \

            }                                                 \

       }                                                      \

       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
\

                                  strt, nfull, expr);        \

    } while (0)


Best,

Jiefei

On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 using gmail.com> wrote:

> Hi Gabriel,
>
> I have tried the macro and found a small issue, it seems like the macro is
> written in C and does an implicit type conversion(const void * to const int
> *), see below. While it is allowed in C, C++ seems not happy with it. Is it
> possible to add an explicit type casting so that it can be compatible with
> both language?
>
>
> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \
>
>                              strt, nfull, expr) do {         \
>
>        *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
> \
>
>        if (px != NULL) {                                      \
>
>            R_xlen_t __ibr_n__ = strt + nfull;                        \
>
>            R_xlen_t nb = __ibr_n__;                                  \
>
>            for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \
>
>               expr                                            \
>
>             }                                                 \
>
>        }                                                      \
>
>        else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
> vtype,       \
>
>                                    strt, nfull, expr);        \
>
>     } while (0)
>
>
>   Also, I notice that the element type(etype) and vector type(vtype) has
> to be specified in the macro. Since the SEXP is the first argument in the
> macro, it seems redundant to define etype and vtype for they have to match
> the type of the SEXP. I'm wondering if this is intentional? Will there be a
> type-free macro in R in the future? Here is a simple type-free macro I'm
> using.
>
> #define type_free_iter(sx, ptr, ind, nbatch,expr)\
>
> switch(TYPEOF(sx)){\
>
> case INTSXP:\
>
>        ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
>
>        break; \
>
> case REALSXP:\
>
>        ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
>
>        break; \
>
> case LGLSXP:\
>
>        ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
>
>        break; \
>
> default:\
>
>        Rf_error("Unknow data type\n"); \
>
>        break; \
>
> }
>
>
>
> // [[Rcpp::export]]
>
> double sillysum(SEXP x) {
>
>        double s = 0.0;
>
>        type_free_iter(x, ptr, ind, nbatch,
>
>               {
>
>                      for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
>
>               });
>
>               return s;
>
> }
>
>
>
>
> Best,
>
> Jiefei
>
> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
>
>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting
>> to see that there are lots of changes in ALTREP in R devel version. I
>> really appreciate your help!
>>
>> Best,
>> Jiefei
>>
>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker using gmail.com>
>> wrote:
>>
>>> Jiefei,
>>>
>>> I've been meaning to write up something about this so hopefully this
>>> will be an impetus for me to actually do that, but until then, responses
>>> inline.
>>>
>>>
>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 using gmail.com> wrote:
>>>
>>>> Hi devel team,
>>>>
>>>> I'm working on C/C++ level ALTREP compatibility for a package. The
>>>> package
>>>> previously used pointers to access the data of a SEXP, so it would not
>>>> work
>>>> for some ALTREP objects which do not have a pointer. I plan to rewrite
>>>> the
>>>> code and use functions like get_elt, get_region, and get_subset to
>>>> access
>>>> the values of a vector, so I have a few questions for ALTREP:
>>>>
>>>> 1. Since an ALTREP do not have to define all of the above
>>>> functions(element, region, subset), is there any way to check which
>>>> function has been defined for an ALTREP class? I did a search on
>>>> RInternal.h and altrep.c but did not find a solution for it. If not,
>>>> will
>>>> it be added in the future?
>>>>
>>>
>>> Element and region are guaranteed to always be defined and work (for
>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't
>>> have region for STRSXP or VECSXP, I believe). If the altrep class does not
>>> provide them then default methods will be used, which may be inefficient in
>>> some cases but will work. Subset is currently a forward looking stub, but
>>> once implimented, that will also be guaranteed to work for all valid ALTREP
>>> classes.
>>>
>>>
>>>>
>>>> 2. Given the diversity of ALTREP classes, what is the best way to loop
>>>> over
>>>> an ALTREP object? I hope there can be an all-in-one function which can
>>>> get
>>>> the values from a vector as long as at least one of the above functions
>>>> has
>>>> been defined, so package developers would not be bothered by tons of
>>>> `if-else` statement if they want their package to work with ALTREP.
>>>> Since
>>>> it seems like there is no such function exist, what could be the best
>>>> way
>>>> to do the loop under the current R version?
>>>>
>>>
>>> The best way to loop over all SEXPs, which supports both altrep and
>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a
>>> number of released versions, at least since 3.5.0 I think) and the much
>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
>>> R_exts/Itermacros.h
>>>
>>> The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL
>>> are as follows (ITERATE_BY_REGION is the same except no strt, and nfull).
>>>
>>>
>>>    - sx - C level variable name of the SEXP to iterate over
>>>    - px - variable name to use for the pointer populated with data from
>>>    a region of sx
>>>    - idx - variable name to use for the "outer", batch counter in the
>>>    for loop. This will contain the 0-indexed start position of the batch
>>>    you're currently processing
>>>    - nb - variable name to use for the current batch size. This will
>>>    always either be GET_REGION_BUFFSIZE (512), or the number of elements
>>>    remaining in the vector, whichever is smaller
>>>    - etype - element (C) type, e.g., int, double, of the data
>>>    - vtype - vector (access API) type, e.g, INTEGER, REAL
>>>    - strt - the 0-indexed position in the vector to start iterating
>>>    - nfull - the total number oif elements to iterate over from the
>>>    vector
>>>    - expr - the code to process a single batch (Which will do things to
>>>    px, typically)
>>>
>>>
>>> So code to perform badly implemented not good idea summing of REALSXP
>>> data might look like
>>>
>>> double sillysum(SEXP x) {
>>>
>>>     double s = 0.0;
>>>
>>>     ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
>>>         {
>>>
>>>             for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
>>>         })
>>>
>>>      return s;
>>> }
>>>
>>> For meatier examples of ITERATE_BY_REGION's use in practice you can grep
>>> the R sources. I know it is used in the implementations of the various
>>> C-level summaries (summary.c), print and formatting functions, and anyNA.
>>>
>>> Some things to remember
>>>
>>>    - If you have an inner loop like the one above, your total position
>>>    in the original vector is ind + i
>>>    - ITERATE_BY_REGION always processes the whole vector, if you need
>>>    to only do part of it yo'll either need custom breaking for both inner and
>>>    outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
>>>    - Don't use the variants ending in 0, all they do is skip over
>>>    things that are a good idea in the case of non-altreps (and some very
>>>    specific altreps).
>>>
>>> Hope that helps.
>>>
>>> Best,
>>> ~G
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>> Best,
>>>> Jiefei
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-devel using r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list