[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

Laurent Gautier |g@ut|er @end|ng |rom gm@||@com
Sat Dec 14 17:25:07 CET 2019


Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <tomas.kalibera using gmail.com> a
écrit :

> On 12/9/19 2:54 PM, Laurent Gautier wrote:
>
>
>
> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <tomas.kalibera using gmail.com> a
> écrit :
>
>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>
>> Thanks for the quick response Tomas.
>>
>> The same error is indeed happening when trying to have a zero-length
>> variable name in an environment. The surprising bit is then "why is this
>> happening during parsing" (that is why are variables assigned to an
>> environment) ?
>>
>> The emitted R error (in the R console) is not a parse (syntax) error, but
>> an error emitted during parsing when the parser tries to intern a name -
>> look it up in a symbol table. Empty string is not allowed as a symbol name,
>> and hence the error. In the call "list(''=1)" , the empty name is what
>> could eventually become a name of a local variable inside list(), even
>> though not yet during parsing.
>>
>
> Thanks Tomas.
>
> I guess this has do with R expressions being lazily evaluated, and names
> of arguments in a call are also part of the expression. Now the puzzling
> part is why is that at all part of the parsing: I would have expected
> R_ParseVector() to be restricted to parsing... Now it feels like
> R_ParseVector() is performing parsing, and a first level of evalution for
> expressions that "should never work" (the empty name).
>
> Think of it as an exception in say Python. Some failures during parsing
> result in an exception (called error in R and implemented using a long
> jump). Any time you are calling into R you can get an error; out of memory
> is also signalled as R error.
>


The surprising bit for me was that I had expected the function to solely
perform parsing. I did expect an exception (and a jmp smashing the stack)
when the function concerned is in the C-API, is parsing a string, and is
using a parameter (pointer) to store whether parsing was a failure or a
success.

Since you are making a comparison with Python, the distinction I am making
between parsing and evaluation seem to apply there. For example:

```
>>> import parser
>>> parser.expr('1+')
  Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    1+
     ^
SyntaxError: unexpected EOF while parsing
>>> p = parser.expr('list(""=1)')
>>> p
<parser.st at 0x7f360e5329f0>
>>> eval(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: eval() arg 1 must be a string, bytes or code object

>>> list(""=1)
  File "<stdin>", line 1
SyntaxError: keyword can't be an expression
```


> There is probably some error in how the external code is handling R
>> errors  (Fatal error: unable to initialize the JIT, stack smashing, etc)
>> and possibly also how R is initialized before calling ParseVector. Probably
>> you would get the same problem when running say "stop('myerror')". Please
>> note R errors are implemented as long-jumps, so care has to be taken when
>> calling into R, Writing R Extensions has more details (and section 8
>> specifically about embedding R). This is unlike parse (syntax) errors
>> signaled via return value to ParseVector()
>>
>
> The issue is that the segfault (because of stack smashing, therefore
> because of what also suspected to be an incontrolled jump) is happening
> within the execution of R_ParseVector(). I would think that an issue with
> the initialization of R is less likely because the project is otherwise
> used a fair bit and is well covered by automated continuous tests.
>
> After looking more into R's gram.c I suspect that an execution context is
> required for R_ParseVector() to know to properly work (know where to jump
> in case of error) when the parsing code decides to fail outside what it
> thinks is a syntax error. If the case, this would make R_ParseVector()
> function well when called from say, a C-extension to an R package, but fail
> the way I am seeing it fail when called from an embedded R.
>
> Yes, contexts are used internally to handle errors. For external use
> please see Writing R Extensions, section 6.12.
>

I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this
is seems to help me overcome the issue. Thanks for the pointer.

Best,


Laurent


> Best
> Tomas
>
>
> Best,
>
> Laurent
>
>> Best,
>> Tomas
>>
>>
>> We are otherwise aware that the error is not occurring in the R console,
>> but can be traced to a call to R_ParseVector() in R's C API:(
>> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
>> ).
>>
>> Our specific setup is calling an embedded R from Python, using the cffi
>> library. An error on end was the first possibility considered, but the
>> puzzling specificity of the error (as shown below other parsing errors are
>> handled properly) and the difficulty tracing what is in happening in
>> R_ParseVector() made me ask whether someone on this list had a suggestion
>> about the possible issue"
>>
>> ```
>>
>> >>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError                             Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name
>> R[write to console]: Fatal error: unable to initialize the JIT
>>
>> *** stack smashing detected ***: <unknown> terminated
>> ```
>>
>>
>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <tomas.kalibera using gmail.com> a
>> écrit :
>>
>>> Dear Laurent,
>>>
>>> could you please provide a complete reproducible example where parsing
>>> results in a crash of R? Calling parse(text="list(''=123") from R works
>>> fine for me (gives Error: attempt to use zero-length variable name).
>>>
>>> I don't think the problem you observed could be related to the memory
>>> leak. The leak is on the heap, not stack.
>>>
>>> Zero-length names of elements in a list are allowed. They are not the
>>> same thing as zero-length variables in an environment. If you try to
>>> convert "lst" from your example to an environment, you would get the
>>> error (attempt to use zero-length variable name).
>>>
>>> Best
>>> Tomas
>>>
>>>
>>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>>> > Hi again,
>>> >
>>> > Beside R_ParseVector()'s possible inconsistent behavior, R's handling
>>> of
>>> > zero-length named elements does not seem consistent either:
>>> >
>>> > ```
>>> >> lst <- list()
>>> >> lst[[""]] <- 1
>>> >> names(lst)
>>> > [1] ""
>>> >> list("" = 1)
>>> > Error: attempt to use zero-length variable name
>>> > ```
>>> >
>>> > Should the parser be made to accept as valid what is otherwise possible
>>> > when using `[[<` ?
>>> >
>>> >
>>> > Best,
>>> >
>>> > Laurent
>>> >
>>> >
>>> >
>>> > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <lgautier using gmail.com> a
>>> écrit :
>>> >
>>> >> I found the following code comment in `src/main/gram.c`:
>>> >>
>>> >> ```
>>> >>
>>> >> /* Memory leak
>>> >>
>>> >> yyparse(), as generated by bison, allocates extra space for the parser
>>> >> stack using malloc(). Unfortunately this means that there is a memory
>>> >> leak in case of an R error (long-jump). In principle, we could define
>>> >> yyoverflow() to relocate the parser stacks for bison and allocate say
>>> on
>>> >> the R heap, but yyoverflow() is undocumented and somewhat complicated
>>> >> (we would have to replicate some macros from the generated parser
>>> here).
>>> >> The same problem exists at least in the Rd and LaTeX parsers in tools.
>>> >> */
>>> >>
>>> >> ```
>>> >>
>>> >> Could this be related to be issue ?
>>> >>
>>> >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <lgautier using gmail.com> a
>>> >> écrit :
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> The behavior of
>>> >>> ```
>>> >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>> >>> ```
>>> >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>>> >>> depending on the string to be parsed.
>>> >>>
>>> >>> Trying to parse a string such as `"list(''=1+"` sets the
>>> >>> `ParseStatus` to incomplete parsing error but trying to parse
>>> >>> `"list(''=123"` will result in R sending a message to the console
>>> (followed but a crash):
>>> >>>
>>> >>> ```
>>> >>> R[write to console]: Error: attempt to use zero-length variable
>>> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack
>>> smashing detected ***: <unknown> terminated
>>> >>> ```
>>> >>>
>>> >>> Is there a reason for the difference in behavior, and is there a
>>> workaround ?
>>> >>>
>>> >>> Thanks,
>>> >>>
>>> >>>
>>> >>> Laurent
>>> >>>
>>> >>>
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-devel using r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list