[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Sun Dec 15 01:56:53 CET 2019


Laurent,


> On Dec 14, 2019, at 5:29 PM, Laurent Gautier <lgautier using gmail.com> wrote:
> 
> Hi Simon,
> 
> Widespread errors would have caught my earlier as the way that code is
> using only one initialization of the embedded R, is used quite a bit, and
> is covered by quite a few unit tests. This is the only situation I am aware
> of in which an error occurs.
> 

It may or may not be "widespread" - almost all R API functions can raise errors (e.g., unable to allocate). You'll only find out once they do and that's too late ;).


> What is a "correct context", or initial context, the code should from ?
> Searching for "context" in the R-exts manual does not return much.
> 

It depends which embedded API use - see R-ext 8.1 the two options are run_Rmainloop() and R_ReplDLLinit() which both setup the top-level context with SETJMP. If you don't use either then you have to use one of the advanced R APIs that do it such as R_ToplevelExec() or R_UnwindProtect(), otherwise your point to abort to on error doesn't exist. Embedding R is much more complex than many think ...

Cheers,
Simon



> Best,
> 
> Laurent
> 
> 
> Le sam. 14 déc. 2019 à 12:20, Simon Urbanek <simon.urbanek using r-project.org> a
> écrit :
> 
>> Laurent,
>> 
>> the main point here is that ParseVector() just like any other R API has to
>> be called in a correct context since it can raise errors so the issue was
>> that your C code has a bug of not setting R correctly (my guess would be
>> your'e not creating the initial context necessary in embedded R). There are
>> many different errors, your is just one of many that can occur - any R API
>> call that does allocation (and parsing obviously does) can cause errors.
>> Note that this is true for pretty much all R API functions.
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>>> On Dec 14, 2019, at 11:25 AM, Laurent Gautier <lgautier using gmail.com>
>> wrote:
>>> 
>>> Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera <tomas.kalibera using gmail.com> a
>>> écrit :
>>> 
>>>> On 12/9/19 2:54 PM, Laurent Gautier wrote:
>>>> 
>>>> 
>>>> 
>>>> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <tomas.kalibera using gmail.com>
>> a
>>>> écrit :
>>>> 
>>>>> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>>>>> 
>>>>> Thanks for the quick response Tomas.
>>>>> 
>>>>> The same error is indeed happening when trying to have a zero-length
>>>>> variable name in an environment. The surprising bit is then "why is
>> this
>>>>> happening during parsing" (that is why are variables assigned to an
>>>>> environment) ?
>>>>> 
>>>>> The emitted R error (in the R console) is not a parse (syntax) error,
>> but
>>>>> an error emitted during parsing when the parser tries to intern a name
>> -
>>>>> look it up in a symbol table. Empty string is not allowed as a symbol
>> name,
>>>>> and hence the error. In the call "list(''=1)" , the empty name is what
>>>>> could eventually become a name of a local variable inside list(), even
>>>>> though not yet during parsing.
>>>>> 
>>>> 
>>>> Thanks Tomas.
>>>> 
>>>> I guess this has do with R expressions being lazily evaluated, and names
>>>> of arguments in a call are also part of the expression. Now the puzzling
>>>> part is why is that at all part of the parsing: I would have expected
>>>> R_ParseVector() to be restricted to parsing... Now it feels like
>>>> R_ParseVector() is performing parsing, and a first level of evalution
>> for
>>>> expressions that "should never work" (the empty name).
>>>> 
>>>> Think of it as an exception in say Python. Some failures during parsing
>>>> result in an exception (called error in R and implemented using a long
>>>> jump). Any time you are calling into R you can get an error; out of
>> memory
>>>> is also signalled as R error.
>>>> 
>>> 
>>> 
>>> The surprising bit for me was that I had expected the function to solely
>>> perform parsing. I did expect an exception (and a jmp smashing the stack)
>>> when the function concerned is in the C-API, is parsing a string, and is
>>> using a parameter (pointer) to store whether parsing was a failure or a
>>> success.
>>> 
>>> Since you are making a comparison with Python, the distinction I am
>> making
>>> between parsing and evaluation seem to apply there. For example:
>>> 
>>> ```
>>>>>> import parser
>>>>>> parser.expr('1+')
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> File "<string>", line 1
>>>   1+
>>>    ^
>>> SyntaxError: unexpected EOF while parsing
>>>>>> p = parser.expr('list(""=1)')
>>>>>> p
>>> <parser.st at 0x7f360e5329f0>
>>>>>> eval(p)
>>> Traceback (most recent call last):
>>> File "<stdin>", line 1, in <module>
>>> TypeError: eval() arg 1 must be a string, bytes or code object
>>> 
>>>>>> list(""=1)
>>> File "<stdin>", line 1
>>> SyntaxError: keyword can't be an expression
>>> ```
>>> 
>>> 
>>>> There is probably some error in how the external code is handling R
>>>>> errors  (Fatal error: unable to initialize the JIT, stack smashing,
>> etc)
>>>>> and possibly also how R is initialized before calling ParseVector.
>> Probably
>>>>> you would get the same problem when running say "stop('myerror')".
>> Please
>>>>> note R errors are implemented as long-jumps, so care has to be taken
>> when
>>>>> calling into R, Writing R Extensions has more details (and section 8
>>>>> specifically about embedding R). This is unlike parse (syntax) errors
>>>>> signaled via return value to ParseVector()
>>>>> 
>>>> 
>>>> The issue is that the segfault (because of stack smashing, therefore
>>>> because of what also suspected to be an incontrolled jump) is happening
>>>> within the execution of R_ParseVector(). I would think that an issue
>> with
>>>> the initialization of R is less likely because the project is otherwise
>>>> used a fair bit and is well covered by automated continuous tests.
>>>> 
>>>> After looking more into R's gram.c I suspect that an execution context
>> is
>>>> required for R_ParseVector() to know to properly work (know where to
>> jump
>>>> in case of error) when the parsing code decides to fail outside what it
>>>> thinks is a syntax error. If the case, this would make R_ParseVector()
>>>> function well when called from say, a C-extension to an R package, but
>> fail
>>>> the way I am seeing it fail when called from an embedded R.
>>>> 
>>>> Yes, contexts are used internally to handle errors. For external use
>>>> please see Writing R Extensions, section 6.12.
>>>> 
>>> 
>>> I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and
>> this
>>> is seems to help me overcome the issue. Thanks for the pointer.
>>> 
>>> Best,
>>> 
>>> 
>>> Laurent
>>> 
>>> 
>>>> Best
>>>> Tomas
>>>> 
>>>> 
>>>> Best,
>>>> 
>>>> Laurent
>>>> 
>>>>> Best,
>>>>> Tomas
>>>>> 
>>>>> 
>>>>> We are otherwise aware that the error is not occurring in the R
>> console,
>>>>> but can be traced to a call to R_ParseVector() in R's C API:(
>>>>> 
>> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
>>>>> ).
>>>>> 
>>>>> Our specific setup is calling an embedded R from Python, using the cffi
>>>>> library. An error on end was the first possibility considered, but the
>>>>> puzzling specificity of the error (as shown below other parsing errors
>> are
>>>>> handled properly) and the difficulty tracing what is in happening in
>>>>> R_ParseVector() made me ask whether someone on this list had a
>> suggestion
>>>>> about the possible issue"
>>>>> 
>>>>> ```
>>>>> 
>>>>>>>> import rpy2.rinterface as ri>>> ri.initr()>>> e =
>> ri.parse("list(''=1+")
>> ---------------------------------------------------------------------------RParsingError
>>                           Traceback (most recent call last)>>> e =
>> ri.parse("list(''=123") R[write to console]: Error: attempt to use
>> zero-length variable name
>>>>> R[write to console]: Fatal error: unable to initialize the JIT
>>>>> 
>>>>> *** stack smashing detected ***: <unknown> terminated
>>>>> ```
>>>>> 
>>>>> 
>>>>> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <tomas.kalibera using gmail.com>
>> a
>>>>> écrit :
>>>>> 
>>>>>> Dear Laurent,
>>>>>> 
>>>>>> could you please provide a complete reproducible example where parsing
>>>>>> results in a crash of R? Calling parse(text="list(''=123") from R
>> works
>>>>>> fine for me (gives Error: attempt to use zero-length variable name).
>>>>>> 
>>>>>> I don't think the problem you observed could be related to the memory
>>>>>> leak. The leak is on the heap, not stack.
>>>>>> 
>>>>>> Zero-length names of elements in a list are allowed. They are not the
>>>>>> same thing as zero-length variables in an environment. If you try to
>>>>>> convert "lst" from your example to an environment, you would get the
>>>>>> error (attempt to use zero-length variable name).
>>>>>> 
>>>>>> Best
>>>>>> Tomas
>>>>>> 
>>>>>> 
>>>>>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>>>>>>> Hi again,
>>>>>>> 
>>>>>>> Beside R_ParseVector()'s possible inconsistent behavior, R's handling
>>>>>> of
>>>>>>> zero-length named elements does not seem consistent either:
>>>>>>> 
>>>>>>> ```
>>>>>>>> lst <- list()
>>>>>>>> lst[[""]] <- 1
>>>>>>>> names(lst)
>>>>>>> [1] ""
>>>>>>>> list("" = 1)
>>>>>>> Error: attempt to use zero-length variable name
>>>>>>> ```
>>>>>>> 
>>>>>>> Should the parser be made to accept as valid what is otherwise
>> possible
>>>>>>> when using `[[<` ?
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Laurent
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <lgautier using gmail.com> a
>>>>>> écrit :
>>>>>>> 
>>>>>>>> I found the following code comment in `src/main/gram.c`:
>>>>>>>> 
>>>>>>>> ```
>>>>>>>> 
>>>>>>>> /* Memory leak
>>>>>>>> 
>>>>>>>> yyparse(), as generated by bison, allocates extra space for the
>> parser
>>>>>>>> stack using malloc(). Unfortunately this means that there is a
>> memory
>>>>>>>> leak in case of an R error (long-jump). In principle, we could
>> define
>>>>>>>> yyoverflow() to relocate the parser stacks for bison and allocate
>> say
>>>>>> on
>>>>>>>> the R heap, but yyoverflow() is undocumented and somewhat
>> complicated
>>>>>>>> (we would have to replicate some macros from the generated parser
>>>>>> here).
>>>>>>>> The same problem exists at least in the Rd and LaTeX parsers in
>> tools.
>>>>>>>> */
>>>>>>>> 
>>>>>>>> ```
>>>>>>>> 
>>>>>>>> Could this be related to be issue ?
>>>>>>>> 
>>>>>>>> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <lgautier using gmail.com>
>> a
>>>>>>>> écrit :
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> The behavior of
>>>>>>>>> ```
>>>>>>>>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>>>>>>>>> ```
>>>>>>>>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>>>>>>>>> depending on the string to be parsed.
>>>>>>>>> 
>>>>>>>>> Trying to parse a string such as `"list(''=1+"` sets the
>>>>>>>>> `ParseStatus` to incomplete parsing error but trying to parse
>>>>>>>>> `"list(''=123"` will result in R sending a message to the console
>>>>>> (followed but a crash):
>>>>>>>>> 
>>>>>>>>> ```
>>>>>>>>> R[write to console]: Error: attempt to use zero-length variable
>>>>>> nameR[write to console]: Fatal error: unable to initialize the JIT***
>> stack
>>>>>> smashing detected ***: <unknown> terminated
>>>>>>>>> ```
>>>>>>>>> 
>>>>>>>>> Is there a reason for the difference in behavior, and is there a
>>>>>> workaround ?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Laurent
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>     [[alternative HTML version deleted]]
>>>>>>> 
>>>>>>> ______________________________________________
>>>>>>> R-devel using r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>>      [[alternative HTML version deleted]]
>>> 
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 



More information about the R-devel mailing list