[Rd] Dynamic list creation (SEXP in C) returns error "unimplemented type (29) in 'duplicate'"

Romain Francois romain at r-enthusiasts.com
Thu Nov 7 14:43:48 CET 2013


Le 07/11/2013 14:30, George Vega Yon a écrit :
> Romain,
>
> Thanks for your quick response. I've already received that suggestion,
> but, besides of haven't ever used C++, I wanted to understand first
> what am I doing wrong.

For that type of code, it is actually quite simpler to learn c++ than it 
is to learn the macros and loose typing of the R interface.

> Still, would you give me a small example, in R
> C++, of:
>
>    - Creating a generic vector "L1" of size N
>    - Creating a data.frame "D" and increasing its the number of rows of it
>    - Storing the data.frame "D" in the first element of "L1"
>
> I would be very gratefull if you can do that.

#include <Rcpp.h>
using namespace Rcpp ;

// [[Rcpp::export]]
List example(int N){
     List out(N) ;

     // let's first accumulate data in these two std::vector
     std::vector<double> x ;
     std::vector<int> y ;
     for( int i=0; i<30; i++){
         x.push_back( sqrt( i ) ) ;
         y.push_back( i ) ;
     }

     // Now let's create a data frame
     DataFrame df = DataFrame::create(
         _["x"] = x,
         _["y"] = y
         ) ;

     // storing df as the first element of out
     out[0] = df ;

     return out ;
}

You can also do it like this acknowleding what a data frame really is 
(just a list of vectors):

     List df = List::create(
         _["x"] = x,
         _["y"] = y
         ) ;
     df.attr( "class" ) = "data.frame" ;
     df.attr( "row.names") = IntegerVector::create(
         IntegerVector::get_na(), -30 ) ;


The key thing here is that we accumulate data into std::vector<double> 
and std::vector<int> which know how to grow efficiently. Looping around 
with SET_LENGTH will allocate and copy data at each iteration of the 
loop which will lead to disastrous performance.

Romain

> Thanks again!
>
> George Vega Yon
> +56 9 7 647 2552
> http://ggvega.cl
>
>
> 2013/11/7 Romain Francois <romain at r-enthusiasts.com>:
>> Hello,
>>
>> Any particular reason you're not using Rcpp? You would have access to nice
>> abstraction instead of these MACROS all over the place.
>>
>> The cost of these abstractions is close to 0.
>>
>> Looping around and SET_LENGTH is going to be quite expensive. I would urge
>> you to accumulate data in data structures that know how to grow efficiently,
>> i.e. a std::vector and then convert that to an R vector when you're done
>> with them.
>>
>> Romain
>>
>> Le 07/11/2013 14:03, George Vega Yon a écrit :
>>
>>> Hi!
>>>
>>> I didn't wanted to do this but I think that this is the easiest way
>>> for you to understand my problem (thanks again for all the comments
>>> that you have made). Here is a copy of the function that I'm working
>>> on. This may be tedious to analyze, so I understand if you don't feel
>>> keen to give it a time. Having dedicated many hours to this (as a new
>>> user of both C and R C API), I would be very pleased to know what am I
>>> doing wrong here.
>>>
>>> G0 is a Nx2 matrix. The first column is a group id (can be shared with
>>> several observations) and the second tells how many individuals are in
>>> that group. This matrix can look something like this:
>>>
>>> id_group  nreps
>>> 1  3
>>> 1  3
>>> 1  3
>>> 2  1
>>> 3  1
>>> 4  2
>>> 5  1
>>> 6  1
>>> 4  2
>>> ...
>>>
>>> L0 is list of two column data.frames with different sizes. The first
>>> column (id) are row indexes (with values 1 to N) and the second column
>>> are real numbers. L0 can look something like this
>>> [[1]]
>>> id  lambda
>>> 3  0.5
>>> 15  0.3
>>> 25  0.2
>>> [[2]]
>>> id  lambda
>>> 15  0.8
>>> 40  0.2
>>> ...
>>> [[N]]
>>> id  lambda
>>> 80  1
>>>
>>> TE0 is a int scalar in {0,1,2}
>>>
>>> T0 is a dichotomous vector of length N that can look something like this
>>> [1] 0 1 0 1 1 1 0 ...
>>> [N] 1
>>>
>>> L1 (the expected output) is a modified version of L0, that, for
>>> instance can look something like this (note the rows marked with "*")
>>>
>>> [[1]]
>>> id  lambda
>>> 3  0.5
>>> *15  0.15 (15 was in the same group of 50, so I added this new row and
>>> divided the value of lambda by two)
>>> 25  0.2
>>> *50  0.15
>>> [[2]]
>>> id  lambda
>>> 15  0.8
>>> 40  0.2
>>> ...
>>> [[N]]
>>> id  lambda
>>> *80  0.333 (80 shared group id with 30 and 100, so lambda is divided by 3)
>>> *30  0.333
>>> *100 0.333
>>>
>>> That said, the function is as follows
>>>
>>> SEXP distribute_lambdas(
>>>     SEXP G0,  // Groups ids (matrix of Nx2). First column = Group Id,
>>> second column: Elements in the group
>>>     SEXP L0,  // List of N two-column dataframes with different number of
>>> rows
>>>     SEXP TE0, // Treatment effect (int scalar): ATE(0) ATT(1) ATC(2)
>>>     SEXP T0   // Treat var (bool vector, 0/1, of size N)
>>> )
>>> {
>>>
>>>     int i, j, l, m;
>>>     const int *G = INTEGER_POINTER(PROTECT(G0 = AS_INTEGER(G0 )));
>>>     const int *T = INTEGER_POINTER(PROTECT(T0 = AS_INTEGER(T0 )));
>>>     const int *TE= INTEGER_POINTER(PROTECT(TE0= AS_INTEGER(TE0)));
>>>     double *L, val;
>>>     int *I, nlambdas, nreps;
>>>
>>>     const int n = length(T0);
>>>
>>>     PROTECT_INDEX pin0, pin1;
>>>     SEXP L1;
>>>     PROTECT(L1 = allocVector(VECSXP,n));
>>>     SEXP id, lambda;
>>>
>>>     // Fixing size
>>>     for(i=0;i<n;i++)
>>>     {
>>>       SET_VECTOR_ELT(L1, i, allocVector(VECSXP, 2));
>>>     //  SET_VECTOR_ELT(VECTOR_ELT(L1,i), 0, NEW_INTEGER(100));
>>>     //  SET_VECTOR_ELT(VECTOR_ELT(L1,i), 1, NEW_NUMERIC(100));
>>>     }
>>>
>>>     // For over the list, i.e observations
>>>     for(i=0;i<n;i++)
>>>     {
>>>
>>>       R_CheckUserInterrupt();
>>>
>>>       // Checking if has to be analyzed.
>>>       if (
>>>         ((TE[0] == 1 & !T[i]) | (TE[0] == 2 & T[i])) |
>>>         (length(VECTOR_ELT(L0,i)) != 2)
>>>       )
>>>       {
>>>         SET_VECTOR_ELT(L1,i,R_NilValue);
>>>         continue;
>>>       }
>>>
>>>       // Checking how many rows does the i-th data.frame has
>>>       nlambdas = length(VECTOR_ELT(VECTOR_ELT(L0,i),0));
>>>
>>>       // Pointing to the data.frame's origianl values
>>>       I =
>>> INTEGER_POINTER(AS_INTEGER(PROTECT(VECTOR_ELT(VECTOR_ELT(L0,i),0))));
>>>       L =
>>> NUMERIC_POINTER(AS_NUMERIC(PROTECT(VECTOR_ELT(VECTOR_ELT(L0,i),1))));
>>>
>>>       // Creating a copy of the pointed values
>>>       PROTECT_WITH_INDEX(id   = duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),0)),
>>> &pin0);
>>>       PROTECT_WITH_INDEX(lambda=duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),1)),
>>> &pin1);
>>>
>>>       // Over the rows of the i-th data.frame
>>>       nreps=0;
>>>       for(l=0;l<nlambdas;l++)
>>>       {
>>>         // If the current lambda id is repeated, ie ther are more
>>> individuals
>>>         // with the same covariates, then enter.
>>>         if (G[n+I[l]-1] > 1)
>>>         {
>>>           /* Changing the length of the object */
>>>           REPROTECT(SET_LENGTH(id,    length(lambda) + G[n+I[l]-1] -1),
>>> pin0);
>>>           REPROTECT(SET_LENGTH(lambda,length(lambda) + G[n+I[l]-1] -1),
>>> pin1);
>>>
>>>           // Getting the new value
>>>           val = L[l]/G[n+I[l] - 1];
>>>           REAL(lambda)[l] = val;
>>>
>>>           // Looping over the full set of groups
>>>           m = -1,j = -1;
>>>           while(m < (G[n+I[l]-1] - 1))
>>>           {
>>>             // Looking for individuals in the same group
>>>             if (G[++j] != G[I[l]-1]) continue;
>>>
>>>             // If it is the current lambda, then do not asign it
>>>             if (j == (I[l] - 1)) continue;
>>>
>>>             INTEGER(id)[length(id) - (G[n+I[l]-1] - 1) + ++m] = j+1;
>>>             REAL(lambda)[length(id) - (G[n+I[l]-1] - 1) + m] = val;
>>>           }
>>>
>>>           nreps+=1;
>>>         }
>>>       }
>>>
>>>       if (nreps)
>>>       {
>>>         // Replacing elements from of the list (modified)
>>>         SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
>>>         SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));
>>>       }
>>>       else {
>>>         // Setting the list with the old elements
>>>         SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0,
>>>           duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),0)));
>>>         SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1,
>>>           duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),1)));
>>>       }
>>>
>>>       // Unprotecting elements
>>>       UNPROTECT(4);
>>>     }
>>>
>>>     Rprintf("Exito\n") ;
>>>     UNPROTECT(4);
>>>
>>>     return L1;
>>> }
>>>
>>> Thanks again in advanced.
>>>
>>> George Vega Yon
>>> +56 9 7 647 2552
>>> http://ggvega.cl
>>>
>>> 2013/11/5 George Vega Yon <g.vegayon at gmail.com>:
>>>>
>>>> Either way, understanding that it may not be the best way of do it, is
>>>> there anything wrong in what I'm doing??
>>>> George Vega Yon
>>>> +56 9 7 647 2552
>>>> http://ggvega.cl
>>>>
>>>>
>>>> 2013/11/5 Gabriel Becker <gmbecker at ucdavis.edu>:
>>>>>
>>>>> George,
>>>>>
>>>>> My point is you don't need to create them and then grow them....
>>>>>
>>>>>
>>>>> for(i=0;i<n;i++)
>>>>> {
>>>>>     // Creating the "id" and "lambda" vectors. I do this in every
>>>>> repetition
>>>>> of
>>>>>     // the loop.
>>>>>
>>>>>     // ... Some other instructions where I set the value of an integer
>>>>>     // z, which tells how much do the vectors have to grow ...
>>>>>
>>>>> PROTECT(id=allocVector(INTSXP, 4 +z));
>>>>> PROTECT(lambda=allocVector(REALSXP, 4 +z));
>>>>>
>>>>>
>>>>>     // ... some lines where I fill the vectors ...
>>>>>
>>>>>     // Storing the new vectors at the i-th element of the list
>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));
>>>>>
>>>>>     // Unprotecting the "id" and "lambda" vectors
>>>>>     UNPROTECT(2);
>>>>> }
>>>>>
>>>>> ~G
>>>>>
>>>>>
>>>>> On Tue, Nov 5, 2013 at 1:56 PM, George Vega Yon <g.vegayon at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Gabriel,
>>>>>>
>>>>>> While the length (in terms of number of SEXP elements it stores) of L1
>>>>>> doesn't changes, the vectors within L1 do (sorry if I didn't explained
>>>>>> it well before).
>>>>>>
>>>>>> The post was about a SEXP object that grows, in my case, every pair of
>>>>>> vectors in L1 (id and lambda) can change lengths, this is why I need
>>>>>> to reprotect them. I populate the i-th element of L1 by creating the
>>>>>> vectors "id" and "lambda", setting the length of these according to
>>>>>> some rule (that's the part where lengths change)... here is a reduced
>>>>>> form of my code:
>>>>>>
>>>>>> //////////////////////////////////////// C
>>>>>> ////////////////////////////////////////
>>>>>> const int = length(L0);
>>>>>> SEXP L1;
>>>>>> PROTECT(L1 = allocVector(VECSXP,n));
>>>>>> SEXP id, lambda;
>>>>>>
>>>>>> // Fixing size
>>>>>> for(i=0;i<n;i++)
>>>>>>     SET_VECTOR_ELT(L1, i, allocVector(VECSXP, 2));
>>>>>>
>>>>>> for(i=0;i<n;i++)
>>>>>> {
>>>>>>     // Creating the "id" and "lambda" vectors. I do this in every
>>>>>> repetition
>>>>>> of
>>>>>>     // the loop.
>>>>>>     PROTECT_WITH_INDEX(id=allocVector(INTSXP, 4), &ipx0);
>>>>>>     PROTECT_WITH_INDEX(lambda=allocVector(REALSXP, 4), &ipx1);
>>>>>>
>>>>>>     // ... Some other instructions where I set the value of an integer
>>>>>>     // z, which tells how much do the vectors have to grow ...
>>>>>>
>>>>>>     REPROTECT(SET_LENGTH(id,    length(lambda) + z), ipx0);
>>>>>>     REPROTECT(SET_LENGTH(lambda,length(lambda) + z), ipx1);
>>>>>>
>>>>>>     // ... some lines where I fill the vectors ...
>>>>>>
>>>>>>     // Storing the new vectors at the i-th element of the list
>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));
>>>>>>
>>>>>>     // Unprotecting the "id" and "lambda" vectors
>>>>>>     UNPROTECT(2);
>>>>>> }
>>>>>>
>>>>>> UNPROTECT(1);
>>>>>>
>>>>>> return L1;
>>>>>> //////////////////////////////////////// C
>>>>>> ////////////////////////////////////////
>>>>>>
>>>>>> I can't set the length from the start because every pair of vectors in
>>>>>> L1 have different lengths, lengths that I cannot tell before starting
>>>>>> the loop.
>>>>>>
>>>>>> Thanks for your help,
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> George Vega Yon
>>>>>> +56 9 7 647 2552
>>>>>> http://ggvega.cl
>>>>>>
>>>>>>
>>>>>> 2013/11/5 Gabriel Becker <gmbecker at ucdavis.edu>:
>>>>>>>
>>>>>>> George,
>>>>>>>
>>>>>>> I don't see the relevance of the stackoverflow post you linked. In the
>>>>>>> post,
>>>>>>> the author wanted to change the length of an existing "mother list"
>>>>>>> (matrix,
>>>>>>> etc), while you specifically state that the length of L1 will not
>>>>>>> change.
>>>>>>>
>>>>>>> You say that the child lists (vectors if they are INTSXP/REALSXP) are
>>>>>>> variable, but that is not what the linked post was about unless I am
>>>>>>> completely missing something.
>>>>>>>
>>>>>>> I can't really say more without knowing the details of how the vectors
>>>>>>> are
>>>>>>> being created and why they cannot just have the right length from the
>>>>>>> start.
>>>>>>>
>>>>>>> As for the error, that is a weird one. I imagine it means that a SEXP
>>>>>>> thinks
>>>>>>> that it has a type other than ones defined in Rinternals. I can't
>>>>>>> speak
>>>>>>> to
>>>>>>> how that could have happened from what you posted though.
>>>>>>>
>>>>>>> Sorry I can't be of more help,
>>>>>>> ~G
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 4, 2013 at 8:00 PM, George Vega Yon <g.vegayon at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear R-devel,
>>>>>>>>
>>>>>>>> A couple of weeks ago I started to use the R C API for package
>>>>>>>> development. Without knowing much about C, I've been able to write
>>>>>>>> some routines sucessfully... until now.
>>>>>>>>
>>>>>>>> My problem consists in dynamically creating a list ("L1") of lists
>>>>>>>> using .Call, the tricky part is that each element of the "mother
>>>>>>>> list"
>>>>>>>> contains two vectors (INTSXP and REALEXP types) with varying sizes;
>>>>>>>> sizes that I set while I'm looping over another list's ("L1")
>>>>>>>> elements
>>>>>>>>    (input list). The steps I've follow are:
>>>>>>>>
>>>>>>>> FIRST: Create the "mother list" of size "n=length(L0)" (doesn't
>>>>>>>> change) and protect it as
>>>>>>>>     PROTECT(L1=allocVector(VECEXP, length(L0)))
>>>>>>>> and filling it with vectors of length two:
>>>>>>>>     for(i=0;i<n;i++) SET_VECTOR_ELT(L1,i, allocVector(VECSXP, 2));
>>>>>>>>
>>>>>>>> then, for each element of the mother list:
>>>>>>>>
>>>>>>>>     for(i=0;i<n;i++) {
>>>>>>>>
>>>>>>>> SECOND: By reading this post in Stackoverflow
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://stackoverflow.com/questions/7458364/growing-an-r-matrix-inside-a-c-loop/7458516#7458516
>>>>>>>> I understood that it was necesary to (1) create the "child lists" and
>>>>>>>> protecting them with PROTECT_WITH_INDEX, and (2) changing its size
>>>>>>>> using SETLENGTH (Rf_lengthgets) and REPROTECT ing the lists in order
>>>>>>>> to tell the GC that the vectors had change.
>>>>>>>>
>>>>>>>> THIRD: Once my two vectors are done ("id" and "lambda"), assign them
>>>>>>>> to the i-th element of the "mother list" L1 using
>>>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1,i), 0, duplicate(id));
>>>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1,i), 1, duplicate(lambda));
>>>>>>>>
>>>>>>>> and unprotecting the elements protected with index: UNPROTECT(2);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> FOURTH: Unprotecting the "mother list" (L1) and return it to R
>>>>>>>>
>>>>>>>> With small datasets this works fine, but after trying with bigger
>>>>>>>> ones
>>>>>>>> R (my code) keeps failing and returning a strange error that I
>>>>>>>> haven't
>>>>>>>> been able to identify (or find in the web)
>>>>>>>>
>>>>>>>>     "unimplemented type (29) in 'duplicate'"
>>>>>>>>
>>>>>>>> This happens right after I try to use the returned list from my
>>>>>>>> routine (trying to print it or building a data-frame).
>>>>>>>>
>>>>>>>> Does anyone have an idea of what am I doing wrong?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> PS: I didn't wanted to copy the entire function... but if you need it
>>>>>>>> I can do it.
>>>>>>>>
>>>>>>>> George Vega Yon
>>>>>>>> +56 9 7 647 2552
>>>>>>>> http://ggvega.cl
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Gabriel Becker
>>>>>>> Graduate Student
>>>>>>> Statistics Department
>>>>>>> University of California, Davis
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gabriel Becker
>>>>> Graduate Student
>>>>> Statistics Department
>>>>> University of California, Davis
>>>
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>
>> --
>> Romain Francois
>> Professional R Enthusiast
>> +33(0) 6 28 91 30 30
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30



More information about the R-devel mailing list