[Rd] Why is srcref of length 6 and not 4 ?

Duncan Murdoch murdoch at stats.uwo.ca
Thu Feb 12 14:25:53 CET 2009


On 12/02/2009 7:01 AM, Romain Francois wrote:
> Hello,
> 
> Consider this file (/tmp/test.R) :
> 
> <file>
> f <- function( x, y = 2 ){
>    z <- x + y
>    print( z )
> }
> </file>
> 
> I get this in R 2.7.2 :
> 
>  > p <- parse( "/tmp/test.R" )
>  > str( attr( p, "srcref" ) )
> List of 1
> $ :Class 'srcref'  atomic [1:4] 1 1 4 1
>  .. ..- attr(*, "srcfile")=Class 'srcfile' length 4 <environment>
> 
> and this in R-devel :
> 
>  > p <- parse( "/tmp/test.R" )
>  > str( attr(p, "srcref") )
> List of 1
> $ :Class 'srcref'  atomic [1:6] 1 1 4 1 1 1
>  .. ..- attr(*, "srcfile")=Class 'srcfile' <environment: 0x946b944>
> 
> What are the two last numbers ?

The original design for srcref gave 4 entries: start line, start byte, 
stop line, stop byte. However, in multibyte strings, bytes don't 
correspond to columns, so error messages could often report the wrong 
location according to what a user sees in an editor.  To support the 
more useful error messages in R-devel, I added two more values: start 
column and stop column.  With pure ASCII text these will be the same as 
start byte and stop byte; with UTF-8 text and non-ASCII characters they 
will be be different.  Other multibyte encodings are only supported if 
the platform can convert them to UTF-8 (and are not well tested; error 
reports would be welcome, if there's a way to improve the performance.)

If you are using these for error reports, I recommend using the two new 
values.  If you are trying to retrieve the text from the source file, 
use the originals.

Duncan Murdoch



More information about the R-devel mailing list