[Rd] R strings, null-terminated or size delimited?

Guillaume Yziquel guillaume.yziquel at citycable.ch
Sun Nov 22 00:31:24 CET 2009


Simon Urbanek a écrit :
> 
> On Nov 21, 2009, at 4:12 PM, Guillaume Yziquel wrote:
> 
>> Hello.
>>
>> I've been looking at vecsexps for my binding.
>>
>> Concerning strings, I'm wondering: are they supposed to be 
>> null-delimited?
> 
> Yes, they are null-delimited when you create/access them.

OK. Fair enough. But is guaranteed that null-delimitation ends where the 
  vecsxp field of the * VECSEXP tells where the R vector should end? Let 
me rephrase that:

-1- Should I consider it a bug if the two informations differ?

-2- What's the "safest" way out of the two?

>> Are they delimited by the info in the SEXPHEADER macro in Rinternals.h?
>
> 
> You should not be touching or reading that.

I believe I should. I'd like the OCaml / R binding to be closely knit to 
R internals. One reason would be for speed, the other being that I'd 
like to make use of camlp4 to write syntax extensions to mix OCaml and R 
syntax. It's therefore important for me not to rely on the R interpreter 
to be active when building R values. Or when marshaling R values via 
OCaml. There are numerous other issues aside this one.

I'm already using #define USE_RINTERNALS in my .c file to inspect R values.

>> Basically, what are the macros or functions to access the values of 
>> the vecsexps?
> 
> VECTOR_ELT and SET_VECTOR_ELT (assuming that you're referring to VECSXP 
> which is are generic vectors).

No. I'm refering to INTSXP for now. But I see what you mean:

> #define INTEGER(x)      ((int *) DATAPTR(x))
> #define VECTOR_ELT(x,i) ((SEXP *) DATAPTR(x))[i]

VECTOR_ELT is not suitable for INTSXP arrays. I need to convert to 
INTSXP array to an OCaml list / array.

>> I'm thinking of CHARSXPs and INTSXPs for the moment...
> 
> Those are entirely different - CHARSXP are not vectors but strings (see 
> mkChar et al., CHAR, ...) and INTSXP are integer arrays (in C speak) 
> accessed using INTEGER.

OK. They're not vectors. They're VECTOR_SEXPRECs.

> Please read R-exts - it's better than guessing.

Funny, I have R-exts.pdf and R-ints.pdf opened. They're fine when it 
comes to writing R extensions. Not when writing bindings embedding R 
into OCaml so that you can beta-reduce isomorphically in R and OCaml.

> Cheers,
> Simon

I'm already using heretic features in OCaml (namely Obj.magic) in order 
to do this binding. I do not mind using heretic features of the R API.

I do not mean to be a pain, but I have to do what needs to be done. If I 
find on my way that #define USE_RINTERNALS is overkill, I'll gladly drop it.

For instance, here's one of my issues: I've extracted the R SEXP for the 
"str" symbol. It's a promise. Now, how do I map such a SEXP to an OCaml 
function? Haven't found that in R-ints.pdf or R-exts.pdf. There's talk 
about functions, but promises are somewhat overlooked. However, such a 
mapping is crucial to me.

I was not guessing when I was trying to look at the internal structure 
of R data. Simply trying to get a grip on how to execute promises, and 
therefore examining such a promise:

> # R.Internal.Pretty.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
> - : R.Internal.Pretty.t =
> PROMISE
>  {value = SYMBOL None;
>   expr =
>    CALL (SYMBOL (Some ("lazyLoadDBfetch", BUILTIN)),
>     [INT [105; 153119]; Unknown; Unknown; Unknown]);
>   env = Unknown}

Or, following structures in Rinternals.h:

> # R.Internal.C.t_of_sexp (R.Raw.sexp_of_t (R.symbol "str"));;
> - : R.Internal.C.t =
> Val
>  {content =
>    PROMSXP
>     {prom_value =
>       Val
>        {content =
>          SYMSXP
>           {pname = Val {content = NILSXP};
>            sym_value = R.Internal.C.Recursive <lazy>;
>            internal = Val {content = NILSXP}}};
>      R.Internal.C.expr =
>       Val
>        {content =
>          LANGSXP
>           {carval =
>             Val
>              {content =
>                SYMSXP
>                 {pname = Val {content = CHARSXP "lazyLoadDBfetch"};
>                  sym_value = Val {content = BUILTINSXP 687};
>                  internal = Val {content = NILSXP}}};
>            cdrval =
>             Val
>              {content =
>                LISTSXP
>                 {carval = Val {content = INTSXP [105; 153119]};
>                  cdrval =
>                   Val
>                    {content =
>                      LISTSXP
>                       {carval =
>                         Val
>                          {content =
>                            SYMSXP
>                             {pname = Val {content = CHARSXP "datafile"};
>                              sym_value =
>                               Val
>                                {content =
>                                  SYMSXP
>                                   {pname = Val {content = NILSXP};
>                                    sym_value = R.Internal.C.Recursive <lazy>;
>                                    internal = Val {content = NILSXP}}};
>                              internal = Val {content = NILSXP}}};
>                        cdrval =
>                         Val
>                          {content =
>                            LISTSXP
>                             {carval =
>                               Val
>                                {content =
>                                  SYMSXP
>                                   {pname =
>                                     Val {content = CHARSXP "compressed"};
>                                    sym_value =
>                                     Val
>                                      {content =
>                                        SYMSXP
>                                         {pname = Val {content = NILSXP};
>                                          sym_value =
>                                           R.Internal.C.Recursive <lazy>;
>                                          internal = Val {content = NILSXP}}};
>                                    internal = Val {content = NILSXP}}};
>                              cdrval =
>                               Val
>                                {content =
>                                  LISTSXP
>                                   {carval =
>                                     Val
>                                      {content =
>                                        SYMSXP
>                                         {pname =
>                                           Val {content = CHARSXP "envhook"};
>                                          sym_value =
>                                           Val
>                                            {content =
>                                              SYMSXP
>                                               {pname = Val {content = NILSXP};
>                                                sym_value =
>                                                 R.Internal.C.Recursive <lazy>;
>                                                internal =
>                                                 Val {content = NILSXP}}};
>                                          internal = Val {content = NILSXP}}};
>                                    cdrval = Val {content = NILSXP};
>                                    tagval = Val {content = NILSXP}}};
>                              tagval = Val {content = NILSXP}}};
>                        tagval = Val {content = NILSXP}}};
>                  tagval = Val {content = NILSXP}}};
>            tagval = Val {content = NILSXP}}};
>      R.Internal.C.env = Val {content = ENVSXP}}}
> # 

For instance, an issue I'd like advice on is: what does such a symbol mean?

>                            SYMSXP
>                             {pname = Val {content = CHARSXP "datafile"};
>                              sym_value =
>                               Val
>                                {content =
>                                  SYMSXP
>                                   {pname = Val {content = NILSXP};
>                                    sym_value = R.Internal.C.Recursive <lazy>;
>                                    internal = Val {content = NILSXP}}};
>                              internal = Val {content = NILSXP}}};

And how is it treated when "str" is executed?

All the best.

-- 
      Guillaume Yziquel
http://yziquel.homelinux.org/



More information about the R-devel mailing list