Peter Dalgaard BSA p.dalgaard@biostat.ku.dk
09 Jul 1999 16:18:37 +0200

```I promised R-core write-up of my understanding of the comment
placement nastiness, but I might as well increase the scope to
R-devel:

(R moves, mangles, or deletes comments in mysterious ways -- see for
instance PR#118)

One basic issue is that any R parse tree can have 0, 1, or more
textual representations. Examples

quote(x)                   # one, presumably...
substitute(+x,x=list(1:2)) # none
quote("+"(2,2))	           # two

This means that parse and deparse are not inverses in any sense, and
since you cannot reliably reproduce the textual input, you cannot
place comments correctly either. Of course, one might just say that if
people do sneaky things, then they deserve whatever they get.

But there's another snag: parse trees can contain any R object and the
only way of placing comments in a parse tree is by attaching them to
one of the nodes. So we have fun with objects carrying the "blah"
comment attribute all over the place, as already happens with

> f<-function()1#blah
> f()
[1] 1
> f
function ()
structure(1, comment = "#blah")
> dput(f())
structure(1, comment = "#blah")
> substitute(+x,list(x=f()))
+structure(1, comment = "#blah")

On top of that there's figuring out which entity should own a comment.
Probably the most viable scheme is that a comment belongs to the token
preceding it. But not all tokens are stored in the parse tree.
Consider

if  #1
( #2
x #3
) #4
3 #5
else #6
4 #7
#8

The parse tree is, however, equivalent to "if"(x, 3, 4)

To faithfully reproduce any "if" construct, one needs to store at
least four comment structures (for "if", "(", ")",and "else") with it.
General function calls have roughly three per actual argument (after
the ',' and before and after the '=').

As you may have gathered, if you set out to get it right this way,
things quickly get unwieldy.

An alternate suggestion (by Robert at the Vienna meeting) is to "store
the text with the function". I am more and more inclined to think that
this is the viable way to proceed. It takes up a bit of space, but so
what? There are several other questions coming up, though:

(1) can one be sure that the text and the parse tree is in sync?
(2) can we retain the automatic indentation when editing?
(3) what does one do with function definitions within functions?

(1) I don't think so. As long as there's a parse step involved in the
definition, it is easy enough, but body()<- and formals()<- is
bound to spoil the fun. One could adopt the convention that such
operations destroy the comment structure (if a function has no
stored text, just use the old deparsing technique). That would
leave only direct manipulations of parse trees as a source for
inconsistency, in which case it would be fair to let the user
deserve what he gets.

(2) (and I think this is useful for debugging, etc.). One thing that
one might do is to store the code in tokenized form more or less
as the lexical analyser generates it. Then just add the
indentation when printing.

(3) I think it is necessary to store the text *both* with the inner
and the outer function. It is not too hard to think of schemes
to keep track of which parts of a token stream are to be cut out
and stored with the "function" tokens (and, when the definition is
evaluated, with the resulting function object).