[Rd] using 2D array of SEXP for creating dataframe

Hervé Pagès hpages at fhcrc.org
Fri Jun 27 03:35:47 CEST 2014


On 06/26/2014 05:18 PM, Sandip Nandi wrote:
> Hi ,
>
> I have asked a question , whether the data structure I am using to
> create a dataframe is fine or there is anyother way i can use. My aim is
> to read  a database and write it to dataframe and do operation on it .
> The dataframe creation ,output everything works .  The code I put is
> wrong , trying to adding pieces and do it ,sorry for that.    I feel my
> way of doing , creating a 2D array may not be the best, so if someone
> can point out any drawback of my method will be great . My code in
> production can read 100k rows and write in 15 seconds . But one case ,
> when I try to assign NA_REAL to a real vector it causes floating point
> exception. So I doubt something is not wrong . People may be doing
> faster,efficient way.
>

Please understand that the code you send is useful for the discussion
only if we can understand it. And for this it needs to make sense.
The code below still makes little sense. Did you try it? For example
you're calling SET_VECTOR_ELT() and setAttrib() on an SEXP ('df') that
you didn't even allocate. Sounds maybe like a detail but because of
that the code will segfault and, more importantly, it's not clear what
kind of SEXP you want 'df' to be.

Also the following line makes no sense:

   setAttrib(df,R_RowNamesSymbol,lsnm);

given that 'lsnm' is c("int", "string") so it looks more like the col
names than the row names (and also because you're apparently trying to
make a 3x2 data.frame, not a 2x2).

Anyway, once you realize that a data.frame is just a list with 3
attributes:

   > attributes(data.frame(int=c(99,89,12), string=c("aa", "vv", "gy")))
   $names
   [1] "int"    "string"

   $row.names
   [1] 1 2 3

   $class
   [1] "data.frame"

everything becomes simple at the C level i.e. just make that list
and stick these 3 attributes on it. You don't need to call R code
from C (which BTW will protect you from random changes in the behavior
of the data.frame() constructor). You don't need the intermediate
'valueVector' data structure (what you seem to be referring to as the
"2D array of SEXP", don't know why, doesn't look like a 2D array to me,
but you never explained).

Cheers,
H.


> This is a sample code
> */**
> *
> *
> *dfm is a dataframe which i assume as list of list . So I created a SEXP
> array valueVector[2]  where each one can hold different datatype .  Now
> values are assigned and dataframe is generated at end*
> *
> *
> **/*
>
> SEXP formDF() {
>
> SEXP dfm ,head,df , dfint , dfStr,lsnm;
> SEXP  valueVector[2];
> char *ab[3] = {"aa","vv","gy"};
> int sn[3] ={99,89,12};
> char *listnames[2] = {"int","string"};
> int i,j;
>
>
> PROTECT(valueVector[0] = allocVector(REALSXP,3));
> PROTECT(valueVector[1] = allocVector(STRSXP,3));
> PROTECT(lsnm = allocVector(STRSXP,2));
>
> SET_STRING_ELT(lsnm,0,mkChar("int"));
> SET_STRING_ELT(lsnm,1,mkChar("string"));
>
> for ( i = 0 ; i < 3; i++ ) {
> SET_STRING_ELT(valueVector[1],i,mkChar(ab[i]));
> REAL(valueVector[0])[i] = sn[i];
> }
>
>
> SET_VECTOR_ELT(df,1,valueVector[0]);
> SET_VECTOR_ELT(df,0,valueVector[1]);
> setAttrib(df,R_RowNamesSymbol,lsnm);
>
> PROTECT(dfm=lang3(install("data.frame"),df,ScalarLogical(FALSE)));
> SET_TAG(CDDR(dfm), install("stringsAsFactors")) ;
> SEXP res = PROTECT(eval(dfm,R_GlobalEnv));
>
> UNPROTECT(7);
> return res;
>
> }
>
>
> On Thu, Jun 26, 2014 at 4:52 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
>     Hi Sandip,
>
>
>     On 06/26/2014 04:21 PM, Sandip Nandi wrote:
>
>         Hi ,
>
>         I have put incomplete code here . The complete code works , My
>         doubt is
>         , what I am doing logical/safe ? Any memory leak going to happen
>         ? is
>         there any way to create dataframe ?
>
>
>     I still don't believe it "works". It doesn't even compile. More below...
>
>
>
>
>
>         SEXP formDF() {
>
>         SEXP dfm ,head,df , dfint , dfStr,lsnm;
>         SEXP  valueVector[2];
>         char *ab[3] = {"aa","vv","gy"};
>         int sn[3] ={99,89,12};
>         char *listnames[2] = {"int","string"};
>         int i,j;
>
>         PROTECT(df = allocVector(VECSXP,2));
>
>         PROTECT(valueVector[0] = allocVector(REALSXP,3));
>         PROTECT(valueVector[1] = allocVector(VECSXP,3));
>         PROTECT(lsnm = allocVector(STRSXP,2));
>
>         SET_STRING_ELT(lsnm,0,mkChar("__int"));
>         SET_STRING_ELT(lsnm,1,mkChar("__string"));
>         SEXP rawvec,headr;
>
>         for ( i = 0 ; i < 3; i++ ) {
>         SET_STRING_ELT(valueVector[1],__0,mkChar(listNames[i]));
>
>
>     'listNames' is undeclared (C is case-sensitive).
>
>     Let's assume you managed to compile this with an (imaginary)
>     case-insensitive C compiler, 'listnames' is an array of length
>     2 and this for loop tries to read the 3 first elements
>     from it. So you're just lucky that you didn't get a segfault.
>     In any case, I don't see how this code could produce
>     the data.frame you're trying to make.
>
>     If you want to discuss how to improve code that *works* (i.e.
>     compiles and produces the expected result), that's fine, but you
>     should be able to show that code. Otherwise it sounds like you're
>     asking people to fix your code. Or to write it for you. Maybe
>     that's fine too but people will be more sympathetic and willing
>     to help if you're honest about it.
>
>     Cheers,
>     H.
>
>         REAL(valueVector[0])[i] = sn[i];
>         }
>
>         SET_VECTOR_ELT(df,1,__valueVector[0]);
>         SET_VECTOR_ELT(df,0,__valueVector[1]);
>         setAttrib(df,R_RowNamesSymbol,__lsnm);
>
>         PROTECT(dfm=lang3(install("__data.frame"),df,ScalarLogical(__FALSE)));
>         SET_TAG(CDDR(dfm), install("stringsAsFactors")) ;
>         SEXP res = PROTECT(eval(dfm,R_GlobalEnv))__;
>
>         UNPROTECT(7);
>         return res;
>
>         }
>
>
>
>         On Thu, Jun 26, 2014 at 3:49 PM, Hervé Pagès <hpages at fhcrc.org
>         <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>> wrote:
>
>              Hi,
>
>
>              On 06/26/2014 02:32 PM, Sandip Nandi wrote:
>
>                  Hi ,
>
>                  For our production package i need to create a
>         dataframein C . So
>                  I wrote
>                  the following code
>
>                  SEXP dfm ,head,df , dfint , dfStr,lsnm;
>
>                  *SEXP  valueVector[2];*
>
>
>                  char *ab[3] = {"aa","vv","gy"};
>                  int sn[3] ={99,89,12};
>                  char *listnames[2] = {"int","string"};
>                  int i,j;
>
>                  //============================____=
>
>
>                  PROTECT(df = allocVector(VECSXP,2));
>
>                  *PROTECT(valueVector[0] = allocVector(REALSXP,3));*
>                  *PROTECT(valueVector[1] = allocVector(VECSXP,3));*
>
>
>
>                  PROTECT(lsnm = allocVector(STRSXP,2));
>
>                  SET_STRING_ELT(lsnm,0,mkChar("____int"));
>                  SET_STRING_ELT(lsnm,1,mkChar("____string"));
>
>                  SEXP rawvec,headr;
>                  unsigned char str[24]="abcdef";
>
>                  for ( i = 0 ; i < 3; i++ ) {
>
>                  *SET_STRING_ELT(valueVector[1]____,i,mkChar(ab[i]));*
>
>                  *REAL(valueVector[0])[i] = sn[i];*
>
>                  }
>
>
>                  It works , data frame is being created and executed
>         properly .
>
>
>              Really? You mean, you can compile this code right?
>         Otherwise it's
>              incomplete: you allocate but do nothing with 'df'. Same
>         with 'lsnm'.
>              And you don't UNPROTECT. With no further treatment, 'df'
>         will be an
>              unnamed list containing junk data, but not the data.frame
>         you expect.
>              So there are a few gaps that would need to be filled before
>         this code
>              actually works as intended.
>
>              Maybe try and come back again with specific questions?
>
>              Cheers,
>              H.
>
>                > Just curious , if I am doing anything wrong or is there
>         another
>              way around
>
>                  for creation of data-frame .  I am concerned about the
>         SEXP 2D
>                  array .
>
>                  Thanks,
>                  Sandip
>
>                           [[alternative HTML version deleted]]
>
>                  __________________________________________________
>         R-devel at r-project.org <mailto:R-devel at r-project.org>
>         <mailto:R-devel at r-project.org <mailto:R-devel at r-project.org>>
>         mailing list
>         https://stat.ethz.ch/mailman/____listinfo/r-devel
>         <https://stat.ethz.ch/mailman/__listinfo/r-devel>
>
>                  <https://stat.ethz.ch/mailman/__listinfo/r-devel
>         <https://stat.ethz.ch/mailman/listinfo/r-devel>>
>
>
>              --
>              Hervé Pagès
>
>              Program in Computational Biology
>              Division of Public Health Sciences
>              Fred Hutchinson Cancer Research Center
>              1100 Fairview Ave. N, M1-B514
>              P.O. Box 19024
>              Seattle, WA 98109-1024
>
>              E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>         <mailto:hpages at fhcrc.org <mailto:hpages at fhcrc.org>>
>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list