[Rd] Inconsistent behavior for the C AP's R_ParseVector() ?

Laurent Gautier |g@ut|er @end|ng |rom gm@||@com
Mon Dec 9 14:54:47 CET 2019


Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera <tomas.kalibera using gmail.com> a
écrit :

> On 12/7/19 10:32 PM, Laurent Gautier wrote:
>
> Thanks for the quick response Tomas.
>
> The same error is indeed happening when trying to have a zero-length
> variable name in an environment. The surprising bit is then "why is this
> happening during parsing" (that is why are variables assigned to an
> environment) ?
>
> The emitted R error (in the R console) is not a parse (syntax) error, but
> an error emitted during parsing when the parser tries to intern a name -
> look it up in a symbol table. Empty string is not allowed as a symbol name,
> and hence the error. In the call "list(''=1)" , the empty name is what
> could eventually become a name of a local variable inside list(), even
> though not yet during parsing.
>

Thanks Tomas.

I guess this has do with R expressions being lazily evaluated, and names of
arguments in a call are also part of the expression. Now the puzzling part
is why is that at all part of the parsing: I would have expected
R_ParseVector() to be restricted to parsing... Now it feels like
R_ParseVector() is performing parsing, and a first level of evalution for
expressions that "should never work" (the empty name).

There is probably some error in how the external code is handling R errors
> (Fatal error: unable to initialize the JIT, stack smashing, etc) and
> possibly also how R is initialized before calling ParseVector. Probably you
> would get the same problem when running say "stop('myerror')". Please note
> R errors are implemented as long-jumps, so care has to be taken when
> calling into R, Writing R Extensions has more details (and section 8
> specifically about embedding R). This is unlike parse (syntax) errors
> signaled via return value to ParseVector()
>

The issue is that the segfault (because of stack smashing, therefore
because of what also suspected to be an incontrolled jump) is happening
within the execution of R_ParseVector(). I would think that an issue with
the initialization of R is less likely because the project is otherwise
used a fair bit and is well covered by automated continuous tests.

After looking more into R's gram.c I suspect that an execution context is
required for R_ParseVector() to know to properly work (know where to jump
in case of error) when the parsing code decides to fail outside what it
thinks is a syntax error. If the case, this would make R_ParseVector()
function well when called from say, a C-extension to an R package, but fail
the way I am seeing it fail when called from an embedded R.

Best,

Laurent

> Best,
> Tomas
>
>
> We are otherwise aware that the error is not occurring in the R console,
> but can be traced to a call to R_ParseVector() in R's C API:(
> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509
> ).
>
> Our specific setup is calling an embedded R from Python, using the cffi
> library. An error on end was the first possibility considered, but the
> puzzling specificity of the error (as shown below other parsing errors are
> handled properly) and the difficulty tracing what is in happening in
> R_ParseVector() made me ask whether someone on this list had a suggestion
> about the possible issue"
>
> ```
>
> >>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") ---------------------------------------------------------------------------RParsingError                             Traceback (most recent call last)>>> e = ri.parse("list(''=123") R[write to console]: Error: attempt to use zero-length variable name
> R[write to console]: Fatal error: unable to initialize the JIT
>
> *** stack smashing detected ***: <unknown> terminated
> ```
>
>
> Le lun. 2 déc. 2019 à 06:37, Tomas Kalibera <tomas.kalibera using gmail.com> a
> écrit :
>
>> Dear Laurent,
>>
>> could you please provide a complete reproducible example where parsing
>> results in a crash of R? Calling parse(text="list(''=123") from R works
>> fine for me (gives Error: attempt to use zero-length variable name).
>>
>> I don't think the problem you observed could be related to the memory
>> leak. The leak is on the heap, not stack.
>>
>> Zero-length names of elements in a list are allowed. They are not the
>> same thing as zero-length variables in an environment. If you try to
>> convert "lst" from your example to an environment, you would get the
>> error (attempt to use zero-length variable name).
>>
>> Best
>> Tomas
>>
>>
>> On 11/30/19 11:55 PM, Laurent Gautier wrote:
>> > Hi again,
>> >
>> > Beside R_ParseVector()'s possible inconsistent behavior, R's handling of
>> > zero-length named elements does not seem consistent either:
>> >
>> > ```
>> >> lst <- list()
>> >> lst[[""]] <- 1
>> >> names(lst)
>> > [1] ""
>> >> list("" = 1)
>> > Error: attempt to use zero-length variable name
>> > ```
>> >
>> > Should the parser be made to accept as valid what is otherwise possible
>> > when using `[[<` ?
>> >
>> >
>> > Best,
>> >
>> > Laurent
>> >
>> >
>> >
>> > Le sam. 30 nov. 2019 à 17:33, Laurent Gautier <lgautier using gmail.com> a
>> écrit :
>> >
>> >> I found the following code comment in `src/main/gram.c`:
>> >>
>> >> ```
>> >>
>> >> /* Memory leak
>> >>
>> >> yyparse(), as generated by bison, allocates extra space for the parser
>> >> stack using malloc(). Unfortunately this means that there is a memory
>> >> leak in case of an R error (long-jump). In principle, we could define
>> >> yyoverflow() to relocate the parser stacks for bison and allocate say
>> on
>> >> the R heap, but yyoverflow() is undocumented and somewhat complicated
>> >> (we would have to replicate some macros from the generated parser
>> here).
>> >> The same problem exists at least in the Rd and LaTeX parsers in tools.
>> >> */
>> >>
>> >> ```
>> >>
>> >> Could this be related to be issue ?
>> >>
>> >> Le sam. 30 nov. 2019 à 14:04, Laurent Gautier <lgautier using gmail.com> a
>> >> écrit :
>> >>
>> >>> Hi,
>> >>>
>> >>> The behavior of
>> >>> ```
>> >>> SEXP R_ParseVector(SEXP, int, ParseStatus *, SEXP);
>> >>> ```
>> >>> defined in `src/include/R_ext/Parse.h` appears to be inconsistent
>> >>> depending on the string to be parsed.
>> >>>
>> >>> Trying to parse a string such as `"list(''=1+"` sets the
>> >>> `ParseStatus` to incomplete parsing error but trying to parse
>> >>> `"list(''=123"` will result in R sending a message to the console
>> (followed but a crash):
>> >>>
>> >>> ```
>> >>> R[write to console]: Error: attempt to use zero-length variable
>> nameR[write to console]: Fatal error: unable to initialize the JIT*** stack
>> smashing detected ***: <unknown> terminated
>> >>> ```
>> >>>
>> >>> Is there a reason for the difference in behavior, and is there a
>> workaround ?
>> >>>
>> >>> Thanks,
>> >>>
>> >>>
>> >>> Laurent
>> >>>
>> >>>
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list