[Rd] Source references from the parser

Deepayan Sarkar deepayan.sarkar at gmail.com
Sat Nov 25 21:12:31 CET 2006


On 11/25/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> I have just committed some changes to R-devel (which will become R 2.5.0
> next spring) to add source references to parsed R code.  Here's a
> description of the scheme:
>
> The design is done through 2 old-style classes.
>
> "srcfile" corresponds to a source file: it contains a filename, the
> working directory in which that filename is to be interpreted, the last
> modified timestamp of the file at the time the object is created, plus
> some internal components.  It is implemented as an environment so that
> there can be multiple references to it.
>
> "srcref" is a reference to a particular range of characters (as the
> parser sees them; I think that really means bytes, but I haven't tested
> with MBCSs) in a source file.  It is implemented as a vector of 4
> integers (first line, first column, last line, last column), with the
> srcfile as an attribute.
>
> The parser attaches a srcref attribute to each complete statement as it
> gets parsed, if option("useSource") is TRUE.  (I've left the old source
> attribute in place as well for functions; I think it won't be needed in
> the long run, but it is needed now.)
>
> When printing an object with a srcref attribute, print.default tries to
> read the srcfile to obtain the text.  If it fails, it falls back to an
> ugly display of the reference.  Using a new argument useSource=FALSE in
> printing will stop this attempt:  when printing language, it will
> deparse; when printing a srcref, it will print the ugly fallback.
>
> source(echo=T) will echo all the lines of the file including comments
> and formatting.  demo() does the same, and I would guess Sweave will do
> this too, but I haven't tested that yet.  I think this will improve
> Sweave output, but will need changes to the input file:  people may have
> comments there that they don't want shown.  Some sort of
> "useSource=FALSE" option will need to be added.
>
> The browser used with debug() etc. will display statements as they were
> formatted in the original source.  It will not display leading or
> following comments, but will display embedded comments.
>
> Parsing errors display the name of the source file that was parsed, and
> display verbose error messages describing what's wrong.  This display
> could still be improved, e.g. by displaying the whole source line with a
> pointer to the error, instead of just the text up to the location of the
> error.
>
> I plan to add some sort of equivalent of C "#line" directives, so that
> preprocessed source files (e.g. the concatenated source that is
> installed) can include references back to the original source files, for
> syntax error reporting, and/or debugging.  This will require
> modification of the INSTALL process, but I haven't started on this yet.
>
> It would probably be a good idea to have some utility functions to play
> with the srcref records for debugging and other purposes, but I haven't
> written those yet.  For example, the current source record on a function
> could be replaced with a srcref, but only by expanding the srcref to
> include some of the surrounding comments.
>
> Comments and problem reports are welcome.

I haven't tested this, but the idea seems useful. Will this have any
effect on code parsed using parse(text = "...")? Can it be extended to
have some such effect? I ask because this is relevant in the context
of Sweave, where I have always wanted the ability to retain the
original formatting. I'm currently testing a patch that allows me to
do this specifically for Sweave, but a more general solution is
obviously preferable.

-Deepayan



More information about the R-devel mailing list