[Rd] `*tmp*`

luke-tierney at uiowa.edu luke-tierney at uiowa.edu
Tue Aug 26 18:53:03 CEST 2014


On Thu, 14 Aug 2014, Michael Haupt wrote:

> Hi Luke,
>
> Am 14.08.2014 um 12:08 schrieb luke-tierney at uiowa.edu:
>> This is a consequence of the tricks the interpreter implementation
>> currently plays to do complex assignments. Compiled code works
>> differently:
>>
>>> library(compiler)
>>> cmpfun(function() {
>> +            x<-c(1,2)
>> +    x[1]<-42
>> +    `*tmp*`[1]<-7 # I would expect this one to fail
>> + })()
>> Error in cmpfun(function() { : object '*tmp*' not found
>
> aha, thank you very much! So the behaviour of the AST and bytecode interpreters differ. Which one is authoritative? Can I cherry-pick? (I'll pick the bytecode interpreter's version if I may.)
>
> Is there actually any code out there that *uses* `*tmp*` and would hence break if the bytecode interpreter was used? Is it encouraged to not directly access `*tmp*`?

There should not be -- the language manual I believe had language that
suggests that this is an implementation detail that should not be
relied upon.

>
> I'm asking all these questions because, in FastR, we're currently quite closely mirroring the AST interpreter's behaviour for complex assignments - if this is not an absolute must-have, I'd be very happy about being able to apply a much leaner implementation instead.

The compiler tries to stay very close to the interpreter. The main
departures are (a) assuming certain bindings are not changeable, based
on optimization level, and (b) cleaning up semantics in cases where
the interpreter does things to either improve performance or simplify
implementation that are not really desirable. These departures should
be described in the noweb file in the compielr package.

(b) applies to the complex assignment mechanism. The use of *tmp* was
convenient but unclean. The byte code interpreter has a stack that is
can use for storing the intermediate stuff that *tmp* is used for in
the interpreter. It would be possible in priciple to make the match
the compiler behavior. I don't know how hard it would be to do so
without a performance hit (or maybe with a gain) -- I just haven't
looked.

The complex assignment mechanism is a very tricky bit of code, and
there are a lot of hidden pitfalls and a number of hidden assumptions
needed for it to work reliable, especially with respect to avoiding
duplication of LHS values, but duplicating when necessary. Another
know issue is that in nested complex assignments the index expressions
will get evaluated twice, which means expressions with side effects
(e.g. ones that generate a random number and update a seed) don't do
what users expect (they _do_ do what the language manual says, since
it implies the multiple evaluation). Addressing this is hard given the
lazy evaluation/nonstandard eval combination, but it would be nice to
see if we can do better.

I keep meaning to write up notes on my current understanding of the
assignment mechanism. I may get to it in a month or two; if i do it
will appear in the R-dev-web tree.

Best,

luke

>
> Best,
>
> Michael
>
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list