[Rd] Dynamic list creation (SEXP in C) returns error "unimplemented type (29) in 'duplicate'"

Romain Francois romain at r-enthusiasts.com
Thu Nov 7 15:46:30 CET 2013


Le 07/11/2013 14:43, Romain Francois a écrit :
> Le 07/11/2013 14:30, George Vega Yon a écrit :
>> Romain,
>>
>> Thanks for your quick response. I've already received that suggestion,
>> but, besides of haven't ever used C++, I wanted to understand first
>> what am I doing wrong.
>
> For that type of code, it is actually quite simpler to learn c++ than it
> is to learn the macros and loose typing of the R interface.
>
>> Still, would you give me a small example, in R
>> C++, of:
>>
>>    - Creating a generic vector "L1" of size N
>>    - Creating a data.frame "D" and increasing its the number of rows
>> of it
>>    - Storing the data.frame "D" in the first element of "L1"
>>
>> I would be very gratefull if you can do that.
>
> #include <Rcpp.h>
> using namespace Rcpp ;
>
> // [[Rcpp::export]]
> List example(int N){
>      List out(N) ;
>
>      // let's first accumulate data in these two std::vector
>      std::vector<double> x ;
>      std::vector<int> y ;
>      for( int i=0; i<30; i++){
>          x.push_back( sqrt( i ) ) ;
>          y.push_back( i ) ;
>      }
>
>      // Now let's create a data frame
>      DataFrame df = DataFrame::create(
>          _["x"] = x,
>          _["y"] = y
>          ) ;
>
>      // storing df as the first element of out
>      out[0] = df ;
>
>      return out ;
> }

Forgot to mention. You would just put the code above in a .cpp file and 
call sourceCpp on it.

sourceCpp( "file.cpp" )
example( 3 )

> You can also do it like this acknowleding what a data frame really is
> (just a list of vectors):
>
>      List df = List::create(
>          _["x"] = x,
>          _["y"] = y
>          ) ;
>      df.attr( "class" ) = "data.frame" ;
>      df.attr( "row.names") = IntegerVector::create(
>          IntegerVector::get_na(), -30 ) ;
>
>
> The key thing here is that we accumulate data into std::vector<double>
> and std::vector<int> which know how to grow efficiently. Looping around
> with SET_LENGTH will allocate and copy data at each iteration of the
> loop which will lead to disastrous performance.
>
> Romain
>
>> Thanks again!
>>
>> George Vega Yon
>> +56 9 7 647 2552
>> http://ggvega.cl
>>
>>
>> 2013/11/7 Romain Francois <romain at r-enthusiasts.com>:
>>> Hello,
>>>
>>> Any particular reason you're not using Rcpp? You would have access to
>>> nice
>>> abstraction instead of these MACROS all over the place.
>>>
>>> The cost of these abstractions is close to 0.
>>>
>>> Looping around and SET_LENGTH is going to be quite expensive. I would
>>> urge
>>> you to accumulate data in data structures that know how to grow
>>> efficiently,
>>> i.e. a std::vector and then convert that to an R vector when you're done
>>> with them.
>>>
>>> Romain
>>>
>>> Le 07/11/2013 14:03, George Vega Yon a écrit :
>>>
>>>> Hi!
>>>>
>>>> I didn't wanted to do this but I think that this is the easiest way
>>>> for you to understand my problem (thanks again for all the comments
>>>> that you have made). Here is a copy of the function that I'm working
>>>> on. This may be tedious to analyze, so I understand if you don't feel
>>>> keen to give it a time. Having dedicated many hours to this (as a new
>>>> user of both C and R C API), I would be very pleased to know what am I
>>>> doing wrong here.
>>>>
>>>> G0 is a Nx2 matrix. The first column is a group id (can be shared with
>>>> several observations) and the second tells how many individuals are in
>>>> that group. This matrix can look something like this:
>>>>
>>>> id_group  nreps
>>>> 1  3
>>>> 1  3
>>>> 1  3
>>>> 2  1
>>>> 3  1
>>>> 4  2
>>>> 5  1
>>>> 6  1
>>>> 4  2
>>>> ...
>>>>
>>>> L0 is list of two column data.frames with different sizes. The first
>>>> column (id) are row indexes (with values 1 to N) and the second column
>>>> are real numbers. L0 can look something like this
>>>> [[1]]
>>>> id  lambda
>>>> 3  0.5
>>>> 15  0.3
>>>> 25  0.2
>>>> [[2]]
>>>> id  lambda
>>>> 15  0.8
>>>> 40  0.2
>>>> ...
>>>> [[N]]
>>>> id  lambda
>>>> 80  1
>>>>
>>>> TE0 is a int scalar in {0,1,2}
>>>>
>>>> T0 is a dichotomous vector of length N that can look something like
>>>> this
>>>> [1] 0 1 0 1 1 1 0 ...
>>>> [N] 1
>>>>
>>>> L1 (the expected output) is a modified version of L0, that, for
>>>> instance can look something like this (note the rows marked with "*")
>>>>
>>>> [[1]]
>>>> id  lambda
>>>> 3  0.5
>>>> *15  0.15 (15 was in the same group of 50, so I added this new row and
>>>> divided the value of lambda by two)
>>>> 25  0.2
>>>> *50  0.15
>>>> [[2]]
>>>> id  lambda
>>>> 15  0.8
>>>> 40  0.2
>>>> ...
>>>> [[N]]
>>>> id  lambda
>>>> *80  0.333 (80 shared group id with 30 and 100, so lambda is divided
>>>> by 3)
>>>> *30  0.333
>>>> *100 0.333
>>>>
>>>> That said, the function is as follows
>>>>
>>>> SEXP distribute_lambdas(
>>>>     SEXP G0,  // Groups ids (matrix of Nx2). First column = Group Id,
>>>> second column: Elements in the group
>>>>     SEXP L0,  // List of N two-column dataframes with different
>>>> number of
>>>> rows
>>>>     SEXP TE0, // Treatment effect (int scalar): ATE(0) ATT(1) ATC(2)
>>>>     SEXP T0   // Treat var (bool vector, 0/1, of size N)
>>>> )
>>>> {
>>>>
>>>>     int i, j, l, m;
>>>>     const int *G = INTEGER_POINTER(PROTECT(G0 = AS_INTEGER(G0 )));
>>>>     const int *T = INTEGER_POINTER(PROTECT(T0 = AS_INTEGER(T0 )));
>>>>     const int *TE= INTEGER_POINTER(PROTECT(TE0= AS_INTEGER(TE0)));
>>>>     double *L, val;
>>>>     int *I, nlambdas, nreps;
>>>>
>>>>     const int n = length(T0);
>>>>
>>>>     PROTECT_INDEX pin0, pin1;
>>>>     SEXP L1;
>>>>     PROTECT(L1 = allocVector(VECSXP,n));
>>>>     SEXP id, lambda;
>>>>
>>>>     // Fixing size
>>>>     for(i=0;i<n;i++)
>>>>     {
>>>>       SET_VECTOR_ELT(L1, i, allocVector(VECSXP, 2));
>>>>     //  SET_VECTOR_ELT(VECTOR_ELT(L1,i), 0, NEW_INTEGER(100));
>>>>     //  SET_VECTOR_ELT(VECTOR_ELT(L1,i), 1, NEW_NUMERIC(100));
>>>>     }
>>>>
>>>>     // For over the list, i.e observations
>>>>     for(i=0;i<n;i++)
>>>>     {
>>>>
>>>>       R_CheckUserInterrupt();
>>>>
>>>>       // Checking if has to be analyzed.
>>>>       if (
>>>>         ((TE[0] == 1 & !T[i]) | (TE[0] == 2 & T[i])) |
>>>>         (length(VECTOR_ELT(L0,i)) != 2)
>>>>       )
>>>>       {
>>>>         SET_VECTOR_ELT(L1,i,R_NilValue);
>>>>         continue;
>>>>       }
>>>>
>>>>       // Checking how many rows does the i-th data.frame has
>>>>       nlambdas = length(VECTOR_ELT(VECTOR_ELT(L0,i),0));
>>>>
>>>>       // Pointing to the data.frame's origianl values
>>>>       I =
>>>> INTEGER_POINTER(AS_INTEGER(PROTECT(VECTOR_ELT(VECTOR_ELT(L0,i),0))));
>>>>       L =
>>>> NUMERIC_POINTER(AS_NUMERIC(PROTECT(VECTOR_ELT(VECTOR_ELT(L0,i),1))));
>>>>
>>>>       // Creating a copy of the pointed values
>>>>       PROTECT_WITH_INDEX(id   =
>>>> duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),0)),
>>>> &pin0);
>>>>
>>>> PROTECT_WITH_INDEX(lambda=duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),1)),
>>>> &pin1);
>>>>
>>>>       // Over the rows of the i-th data.frame
>>>>       nreps=0;
>>>>       for(l=0;l<nlambdas;l++)
>>>>       {
>>>>         // If the current lambda id is repeated, ie ther are more
>>>> individuals
>>>>         // with the same covariates, then enter.
>>>>         if (G[n+I[l]-1] > 1)
>>>>         {
>>>>           /* Changing the length of the object */
>>>>           REPROTECT(SET_LENGTH(id,    length(lambda) + G[n+I[l]-1] -1),
>>>> pin0);
>>>>           REPROTECT(SET_LENGTH(lambda,length(lambda) + G[n+I[l]-1] -1),
>>>> pin1);
>>>>
>>>>           // Getting the new value
>>>>           val = L[l]/G[n+I[l] - 1];
>>>>           REAL(lambda)[l] = val;
>>>>
>>>>           // Looping over the full set of groups
>>>>           m = -1,j = -1;
>>>>           while(m < (G[n+I[l]-1] - 1))
>>>>           {
>>>>             // Looking for individuals in the same group
>>>>             if (G[++j] != G[I[l]-1]) continue;
>>>>
>>>>             // If it is the current lambda, then do not asign it
>>>>             if (j == (I[l] - 1)) continue;
>>>>
>>>>             INTEGER(id)[length(id) - (G[n+I[l]-1] - 1) + ++m] = j+1;
>>>>             REAL(lambda)[length(id) - (G[n+I[l]-1] - 1) + m] = val;
>>>>           }
>>>>
>>>>           nreps+=1;
>>>>         }
>>>>       }
>>>>
>>>>       if (nreps)
>>>>       {
>>>>         // Replacing elements from of the list (modified)
>>>>         SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
>>>>         SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));
>>>>       }
>>>>       else {
>>>>         // Setting the list with the old elements
>>>>         SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0,
>>>>           duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),0)));
>>>>         SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1,
>>>>           duplicate(VECTOR_ELT(VECTOR_ELT(L0,i),1)));
>>>>       }
>>>>
>>>>       // Unprotecting elements
>>>>       UNPROTECT(4);
>>>>     }
>>>>
>>>>     Rprintf("Exito\n") ;
>>>>     UNPROTECT(4);
>>>>
>>>>     return L1;
>>>> }
>>>>
>>>> Thanks again in advanced.
>>>>
>>>> George Vega Yon
>>>> +56 9 7 647 2552
>>>> http://ggvega.cl
>>>>
>>>> 2013/11/5 George Vega Yon <g.vegayon at gmail.com>:
>>>>>
>>>>> Either way, understanding that it may not be the best way of do it, is
>>>>> there anything wrong in what I'm doing??
>>>>> George Vega Yon
>>>>> +56 9 7 647 2552
>>>>> http://ggvega.cl
>>>>>
>>>>>
>>>>> 2013/11/5 Gabriel Becker <gmbecker at ucdavis.edu>:
>>>>>>
>>>>>> George,
>>>>>>
>>>>>> My point is you don't need to create them and then grow them....
>>>>>>
>>>>>>
>>>>>> for(i=0;i<n;i++)
>>>>>> {
>>>>>>     // Creating the "id" and "lambda" vectors. I do this in every
>>>>>> repetition
>>>>>> of
>>>>>>     // the loop.
>>>>>>
>>>>>>     // ... Some other instructions where I set the value of an
>>>>>> integer
>>>>>>     // z, which tells how much do the vectors have to grow ...
>>>>>>
>>>>>> PROTECT(id=allocVector(INTSXP, 4 +z));
>>>>>> PROTECT(lambda=allocVector(REALSXP, 4 +z));
>>>>>>
>>>>>>
>>>>>>     // ... some lines where I fill the vectors ...
>>>>>>
>>>>>>     // Storing the new vectors at the i-th element of the list
>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));
>>>>>>
>>>>>>     // Unprotecting the "id" and "lambda" vectors
>>>>>>     UNPROTECT(2);
>>>>>> }
>>>>>>
>>>>>> ~G
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 5, 2013 at 1:56 PM, George Vega Yon <g.vegayon at gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Gabriel,
>>>>>>>
>>>>>>> While the length (in terms of number of SEXP elements it stores)
>>>>>>> of L1
>>>>>>> doesn't changes, the vectors within L1 do (sorry if I didn't
>>>>>>> explained
>>>>>>> it well before).
>>>>>>>
>>>>>>> The post was about a SEXP object that grows, in my case, every
>>>>>>> pair of
>>>>>>> vectors in L1 (id and lambda) can change lengths, this is why I need
>>>>>>> to reprotect them. I populate the i-th element of L1 by creating the
>>>>>>> vectors "id" and "lambda", setting the length of these according to
>>>>>>> some rule (that's the part where lengths change)... here is a
>>>>>>> reduced
>>>>>>> form of my code:
>>>>>>>
>>>>>>> //////////////////////////////////////// C
>>>>>>> ////////////////////////////////////////
>>>>>>> const int = length(L0);
>>>>>>> SEXP L1;
>>>>>>> PROTECT(L1 = allocVector(VECSXP,n));
>>>>>>> SEXP id, lambda;
>>>>>>>
>>>>>>> // Fixing size
>>>>>>> for(i=0;i<n;i++)
>>>>>>>     SET_VECTOR_ELT(L1, i, allocVector(VECSXP, 2));
>>>>>>>
>>>>>>> for(i=0;i<n;i++)
>>>>>>> {
>>>>>>>     // Creating the "id" and "lambda" vectors. I do this in every
>>>>>>> repetition
>>>>>>> of
>>>>>>>     // the loop.
>>>>>>>     PROTECT_WITH_INDEX(id=allocVector(INTSXP, 4), &ipx0);
>>>>>>>     PROTECT_WITH_INDEX(lambda=allocVector(REALSXP, 4), &ipx1);
>>>>>>>
>>>>>>>     // ... Some other instructions where I set the value of an
>>>>>>> integer
>>>>>>>     // z, which tells how much do the vectors have to grow ...
>>>>>>>
>>>>>>>     REPROTECT(SET_LENGTH(id,    length(lambda) + z), ipx0);
>>>>>>>     REPROTECT(SET_LENGTH(lambda,length(lambda) + z), ipx1);
>>>>>>>
>>>>>>>     // ... some lines where I fill the vectors ...
>>>>>>>
>>>>>>>     // Storing the new vectors at the i-th element of the list
>>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1, i), 0, duplicate(id));
>>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1, i), 1, duplicate(lambda));
>>>>>>>
>>>>>>>     // Unprotecting the "id" and "lambda" vectors
>>>>>>>     UNPROTECT(2);
>>>>>>> }
>>>>>>>
>>>>>>> UNPROTECT(1);
>>>>>>>
>>>>>>> return L1;
>>>>>>> //////////////////////////////////////// C
>>>>>>> ////////////////////////////////////////
>>>>>>>
>>>>>>> I can't set the length from the start because every pair of
>>>>>>> vectors in
>>>>>>> L1 have different lengths, lengths that I cannot tell before
>>>>>>> starting
>>>>>>> the loop.
>>>>>>>
>>>>>>> Thanks for your help,
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> George Vega Yon
>>>>>>> +56 9 7 647 2552
>>>>>>> http://ggvega.cl
>>>>>>>
>>>>>>>
>>>>>>> 2013/11/5 Gabriel Becker <gmbecker at ucdavis.edu>:
>>>>>>>>
>>>>>>>> George,
>>>>>>>>
>>>>>>>> I don't see the relevance of the stackoverflow post you linked.
>>>>>>>> In the
>>>>>>>> post,
>>>>>>>> the author wanted to change the length of an existing "mother list"
>>>>>>>> (matrix,
>>>>>>>> etc), while you specifically state that the length of L1 will not
>>>>>>>> change.
>>>>>>>>
>>>>>>>> You say that the child lists (vectors if they are
>>>>>>>> INTSXP/REALSXP) are
>>>>>>>> variable, but that is not what the linked post was about unless
>>>>>>>> I am
>>>>>>>> completely missing something.
>>>>>>>>
>>>>>>>> I can't really say more without knowing the details of how the
>>>>>>>> vectors
>>>>>>>> are
>>>>>>>> being created and why they cannot just have the right length
>>>>>>>> from the
>>>>>>>> start.
>>>>>>>>
>>>>>>>> As for the error, that is a weird one. I imagine it means that a
>>>>>>>> SEXP
>>>>>>>> thinks
>>>>>>>> that it has a type other than ones defined in Rinternals. I can't
>>>>>>>> speak
>>>>>>>> to
>>>>>>>> how that could have happened from what you posted though.
>>>>>>>>
>>>>>>>> Sorry I can't be of more help,
>>>>>>>> ~G
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Nov 4, 2013 at 8:00 PM, George Vega Yon
>>>>>>>> <g.vegayon at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dear R-devel,
>>>>>>>>>
>>>>>>>>> A couple of weeks ago I started to use the R C API for package
>>>>>>>>> development. Without knowing much about C, I've been able to write
>>>>>>>>> some routines sucessfully... until now.
>>>>>>>>>
>>>>>>>>> My problem consists in dynamically creating a list ("L1") of lists
>>>>>>>>> using .Call, the tricky part is that each element of the "mother
>>>>>>>>> list"
>>>>>>>>> contains two vectors (INTSXP and REALEXP types) with varying
>>>>>>>>> sizes;
>>>>>>>>> sizes that I set while I'm looping over another list's ("L1")
>>>>>>>>> elements
>>>>>>>>>    (input list). The steps I've follow are:
>>>>>>>>>
>>>>>>>>> FIRST: Create the "mother list" of size "n=length(L0)" (doesn't
>>>>>>>>> change) and protect it as
>>>>>>>>>     PROTECT(L1=allocVector(VECEXP, length(L0)))
>>>>>>>>> and filling it with vectors of length two:
>>>>>>>>>     for(i=0;i<n;i++) SET_VECTOR_ELT(L1,i, allocVector(VECSXP, 2));
>>>>>>>>>
>>>>>>>>> then, for each element of the mother list:
>>>>>>>>>
>>>>>>>>>     for(i=0;i<n;i++) {
>>>>>>>>>
>>>>>>>>> SECOND: By reading this post in Stackoverflow
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://stackoverflow.com/questions/7458364/growing-an-r-matrix-inside-a-c-loop/7458516#7458516
>>>>>>>>>
>>>>>>>>> I understood that it was necesary to (1) create the "child
>>>>>>>>> lists" and
>>>>>>>>> protecting them with PROTECT_WITH_INDEX, and (2) changing its size
>>>>>>>>> using SETLENGTH (Rf_lengthgets) and REPROTECT ing the lists in
>>>>>>>>> order
>>>>>>>>> to tell the GC that the vectors had change.
>>>>>>>>>
>>>>>>>>> THIRD: Once my two vectors are done ("id" and "lambda"), assign
>>>>>>>>> them
>>>>>>>>> to the i-th element of the "mother list" L1 using
>>>>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1,i), 0, duplicate(id));
>>>>>>>>>     SET_VECTOR_ELT(VECTOR_ELT(L1,i), 1, duplicate(lambda));
>>>>>>>>>
>>>>>>>>> and unprotecting the elements protected with index: UNPROTECT(2);
>>>>>>>>>
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> FOURTH: Unprotecting the "mother list" (L1) and return it to R
>>>>>>>>>
>>>>>>>>> With small datasets this works fine, but after trying with bigger
>>>>>>>>> ones
>>>>>>>>> R (my code) keeps failing and returning a strange error that I
>>>>>>>>> haven't
>>>>>>>>> been able to identify (or find in the web)
>>>>>>>>>
>>>>>>>>>     "unimplemented type (29) in 'duplicate'"
>>>>>>>>>
>>>>>>>>> This happens right after I try to use the returned list from my
>>>>>>>>> routine (trying to print it or building a data-frame).
>>>>>>>>>
>>>>>>>>> Does anyone have an idea of what am I doing wrong?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> PS: I didn't wanted to copy the entire function... but if you
>>>>>>>>> need it
>>>>>>>>> I can do it.
>>>>>>>>>
>>>>>>>>> George Vega Yon
>>>>>>>>> +56 9 7 647 2552
>>>>>>>>> http://ggvega.cl
>>>>>>>>>
>>>>>>>>> ______________________________________________
>>>>>>>>> R-devel at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Gabriel Becker
>>>>>>>> Graduate Student
>>>>>>>> Statistics Department
>>>>>>>> University of California, Davis
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gabriel Becker
>>>>>> Graduate Student
>>>>>> Statistics Department
>>>>>> University of California, Davis
>>>>
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>
>>>
>>> --
>>> Romain Francois
>>> Professional R Enthusiast
>>> +33(0) 6 28 91 30 30
>>>
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>


-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30



More information about the R-devel mailing list