[Rd] Inconsistent Parse Behavior

brodie gaslam brodie.gaslam at yahoo.com
Wed Dec 24 15:00:26 CET 2014

Under some specific conditions, `parse` seems to produce inconsistent and potentially incorrect results the first time it is run in a fresh clean R session.  Consider this code where we parse the same text twice in a row, and get one value in the parse data that is mismatched:
```Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> txt <- 'c("", {
+   c(integer(3L), 1:3)
+   c(integer(), 1:3, 1L)         # TRUE
+   c(integer(), c(1, 2, 3), 1L)  # TRUE
+ } )
+ c("", {
+   lst <- list(list( 1,  2), list( 3, list( 4, list( 5, list(6, 6.1, 6.2)))))
+ } )
+ c("", {
+   TRUE
+ } )'
> prs1 <- parse(text=txt, keep.source=TRUE)
> prs2 <- parse(text=txt, keep.source=TRUE)
> which(attr(prs1, "srcfile")$parseData != attr(prs2, "srcfile")$parseData)
[1] 1176
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base 
```This discrepancy does not happen if I simplify the code to parse in any way.  The code as it is is a much simplified version of the code that first produced the error for me.  I cannot reduce it further without also eliminating the error.
Unfortunately, the discrepancy is meaningful.  The problem is the first parse.  Looking at `getParseData` output:```> subset(getParseData(prs1), id %in% c(226, 234))
    line1 col1 line2 col2  id parent token terminal text
226     6    1     8    3 226    234  expr    FALSE     
234     9    5     9    5 234    251   ','     TRUE    ,```Notice how item 226 has for parent item 234 that starts on line 9, col 5, after item 226 ends.  I'm not sure how this is possible.
In the second parse, the parse data is as one would expect:```> subset(getParseData(prs2), id == 226)
    line1 col1 line2 col2  id parent token terminal text
226     6    1     8    3 226      0  expr    FALSE    
```The parent here is the top level (0), as would be expected looking at the source code in `txt` (226 represents the second `c(...)` block).
I suspect the problem is caused by the use of `{}` inside of `f()`, but again, it is not that simple since any further simplification of my code above seems to resolve the problem.  I also don't know why it would work fine the second time, though there must be some state initialization inside the parser going on.
Any help appreciated.

	[[alternative HTML version deleted]]

More information about the R-devel mailing list