[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

Tomas Kalibera tom@@@k@||ber@ @end|ng |rom gm@||@com
Mon Dec 9 15:57:53 CET 2019


On 12/9/19 2:54 PM, Laurent Gautier wrote:
>
>
> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <tomas.kalibera using gmail.com 
> <mailto:tomas.kalibera using gmail.com>> a écrit :
>
>     On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>     Thanks for the quick response Tomas.
>>
>>     The same error is indeed happening when trying to have a
>>     zero-length variable name in an environment. The surprising bit
>>     is then "why is this happening during parsing" (that is why are
>>     variables assigned to an environment) ?
>
>     The emitted R error (in the R console) is not a parse (syntax)
>     error, but an error emitted during parsing when the parser tries
>     to intern a name - look it up in a symbol table. Empty string is
>     not allowed as a symbol name, and hence the error. In the call
>     "list(''=1)" , the empty name is what could eventually become a
>     name of a local variable inside list(), even though not yet during
>     parsing.
>
>
> Thanks Tomas.
>
> I guess this has do with R expressions being lazily evaluated, and 
> names of arguments in a call are also part of the expression. Now the 
> puzzling part is why is that at all part of the parsing: I would have 
> expected R_ParseVector() to be restricted to parsing... Now it feels 
> like R_ParseVector() is performing parsing, and a first level of 
> evalution for expressions that "should never work" (the empty name).
Think of it as an exception in say Python. Some failures during parsing 
result in an exception (called error in R and implemented using a long 
jump). Any time you are calling into R you can get an error; out of 
memory is also signalled as R error.
>
>     There is probably some error in how the external code is handling
>     R errors  (Fatal error: unable to initialize the JIT, stack
>     smashing, etc) and possibly also how R is initialized before
>     calling ParseVector. Probably you would get the same problem when
>     running say "stop('myerror')". Please note R errors are
>     implemented as long-jumps, so care has to be taken when calling
>     into R, Writing R Extensions has more details (and section 8
>     specifically about embedding R). This is unlike parse (syntax)
>     errors signaled via return value to ParseVector()
>
>
> The issue is that the segfault (because of stack smashing, therefore 
> because of what also suspected to be an incontrolled jump) is 
> happening within the execution of R_ParseVector(). I would think that 
> an issue with the initialization of R is less likely because the 
> project is otherwise used a fair bit and is well covered by automated 
> continuous tests.
>
> After looking more into R's gram.c I suspect that an execution context 
> is required for R_ParseVector() to know to properly work (know where 
> to jump in case of error) when the parsing code decides to fail 
> outside what it thinks is a syntax error. If the case, this would make 
> R_ParseVector() function well when called from say, a C-extension to 
> an R package, but fail the way I am seeing it fail when called from an 
> embedded R.

Yes, contexts are used internally to handle errors. For external use 
please see Writing R Extensions, section 6.12.

Best
Tomas

> Best,
>
> Laurent
>
>     Best,
>     Tomas
>
>>
>>     We are otherwise aware that the error is not occurring in the R
>>     console, but can be traced to a call to R_ParseVector() in R's C
>>     API:(https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509).
>>
>>     Our specific setup is calling an embedded R from Python, using
>>     the cffi library. An error on end was the first possibility
>>     considered, but the puzzling specificity of the error (as shown
>>     below other parsing errors are handled properly) and the
>>     difficulty tracing what is in happening in R_ParseVector() made
>>     me ask whether someone on this list had a suggestion about the
>>     possible issue"
>>
>>     ```
>>     >>>  import  rpy2.rinterface  as  ri
>>     >>>  ri.initr()
>>     >>>  e  =  ri.parse("list(''=1+")  
>>     ---------------------------------------------------------------------------
>>     RParsingError                              Traceback  (most  recent  call  last)>>> e = ri.parse("list(''=123") R[write to console]: Error:
>>     attempt to use zero-length variable name R[write to console]:
>>     Fatal error: unable to initialize the JIT *** stack smashing
>>     detected ***: <unknown> terminated ```
>>
>>     Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera
>>     <tomas.kalibera using gmail.com <mailto:tomas.kalibera using gmail.com>> a
>>     écrit :
>>
>>         Dear Laurent,
>>
>>         could you please provide a complete reproducible example
>>         where parsing
>>         results in a crash of R? Calling parse(text="list(''=123")
>>         from R works
>>         fine for me (gives Error: attempt to use zero-length variable
>>         name).
>>
>>         I don't think the problem you observed could be related to
>>         the memory
>>         leak. The leak is on the heap, not stack.
>>
>>         Zero-length names of elements in a list are allowed. They are
>>         not the
>>         same thing as zero-length variables in an environment. If you
>>         try to
>>         convert "lst" from your example to an environment, you would
>>         get the
>>         error (attempt to use zero-length variable name).
>>
>>         Best
>>         Tomas
>>
>>
>>         On 11/30/19 11:55 PM, Laurent Gautier wrote:
>>         > Hi again,
>>         >
>>         > Beside R_ParseVector()'s possible inconsistent behavior,
>>         R's handling of
>>         > zero-length named elements does not seem consistent either:
>>         >
>>         > ```
>>         >> lst <- list()
>>         >> lst[[""]] <- 1
>>         >> names(lst)
>>         > [1] ""
>>         >> list("" = 1)
>>         > Error: attempt to use zero-length variable name
>>         > ```
>>         >
>>         > Should the parser be made to accept as valid what is
>>         otherwise possible
>>         > when using `[[<` ?
>>         >
>>         >
>>         > Best,
>>         >
>>         > Laurent
>>         >
>>         >
>>         >
>>         > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier
>>         <lgautier using gmail.com <mailto:lgautier using gmail.com>> a écrit :
>>         >
>>         >> I found the following code comment in `src/main/gram.c`:
>>         >>
>>         >> ```
>>         >>
>>         >> /* Memory leak
>>         >>
>>         >> yyparse(), as generated by bison, allocates extra space
>>         for the parser
>>         >> stack using malloc(). Unfortunately this means that there
>>         is a memory
>>         >> leak in case of an R error (long-jump). In principle, we
>>         could define
>>         >> yyoverflow() to relocate the parser stacks for bison and
>>         allocate say on
>>         >> the R heap, but yyoverflow() is undocumented and somewhat
>>         complicated
>>         >> (we would have to replicate some macros from the generated
>>         parser here).
>>         >> The same problem exists at least in the Rd and LaTeX
>>         parsers in tools.
>>         >> */
>>         >>
>>         >> ```
>>         >>
>>         >> Could this be related to be issue ?
>>         >>
>>         >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier
>>         <lgautier using gmail.com <mailto:lgautier using gmail.com>> a
>>         >> écrit :
>>         >>
>>         >>> Hi,
>>         >>>
>>         >>> The behavior of
>>         >>> ```
>>         >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>         >>> ```
>>         >>> defined in `src/include/R_ext/Parse.h` appears to be
>>         inconsistent
>>         >>> depending on the string to be parsed.
>>         >>>
>>         >>> Trying to parse a string such as `"list(''=1+"` sets the
>>         >>> `ParseStatus` to incomplete parsing error but trying to parse
>>         >>> `"list(''=123"` will result in R sending a message to the
>>         console (followed but a crash):
>>         >>>
>>         >>> ```
>>         >>> R[write to console]: Error: attempt to use zero-length
>>         variable nameR[write to console]: Fatal error: unable to
>>         initialize the JIT*** stack smashing detected ***: <unknown>
>>         terminated
>>         >>> ```
>>         >>>
>>         >>> Is there a reason for the difference in behavior, and is
>>         there a workaround ?
>>         >>>
>>         >>> Thanks,
>>         >>>
>>         >>>
>>         >>> Laurent
>>         >>>
>>         >>>
>>         >       [[alternative HTML version deleted]]
>>         >
>>         > ______________________________________________
>>         > R-devel using r-project.org <mailto:R-devel using r-project.org>
>>         mailing list
>>         > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>


	[[alternative HTML version deleted]]



More information about the R-devel mailing list