[Rd] Source references from the parser
murdoch at stats.uwo.ca
Sat Nov 25 19:51:20 CET 2006
I have just committed some changes to R-devel (which will become R 2.5.0
next spring) to add source references to parsed R code. Here's a
description of the scheme:
The design is done through 2 old-style classes.
"srcfile" corresponds to a source file: it contains a filename, the
working directory in which that filename is to be interpreted, the last
modified timestamp of the file at the time the object is created, plus
some internal components. It is implemented as an environment so that
there can be multiple references to it.
"srcref" is a reference to a particular range of characters (as the
parser sees them; I think that really means bytes, but I haven't tested
with MBCSs) in a source file. It is implemented as a vector of 4
integers (first line, first column, last line, last column), with the
srcfile as an attribute.
The parser attaches a srcref attribute to each complete statement as it
gets parsed, if option("useSource") is TRUE. (I've left the old source
attribute in place as well for functions; I think it won't be needed in
the long run, but it is needed now.)
When printing an object with a srcref attribute, print.default tries to
read the srcfile to obtain the text. If it fails, it falls back to an
ugly display of the reference. Using a new argument useSource=FALSE in
printing will stop this attempt: when printing language, it will
deparse; when printing a srcref, it will print the ugly fallback.
source(echo=T) will echo all the lines of the file including comments
and formatting. demo() does the same, and I would guess Sweave will do
this too, but I haven't tested that yet. I think this will improve
Sweave output, but will need changes to the input file: people may have
comments there that they don't want shown. Some sort of
"useSource=FALSE" option will need to be added.
The browser used with debug() etc. will display statements as they were
formatted in the original source. It will not display leading or
following comments, but will display embedded comments.
Parsing errors display the name of the source file that was parsed, and
display verbose error messages describing what's wrong. This display
could still be improved, e.g. by displaying the whole source line with a
pointer to the error, instead of just the text up to the location of the
I plan to add some sort of equivalent of C "#line" directives, so that
preprocessed source files (e.g. the concatenated source that is
installed) can include references back to the original source files, for
syntax error reporting, and/or debugging. This will require
modification of the INSTALL process, but I haven't started on this yet.
It would probably be a good idea to have some utility functions to play
with the srcref records for debugging and other purposes, but I haven't
written those yet. For example, the current source record on a function
could be replaced with a srcref, but only by expanding the srcref to
include some of the surrounding comments.
Comments and problem reports are welcome.
More information about the R-devel