[Rd] Small changes to big objects (2): Local Reference Classes

John Chambers jmc at r-project.org
Sat Jan 5 19:55:51 CET 2013


Back to the scenario in my email of Jan. 3:  We have objects with some 
large (or very large) components and some other components as well.  We 
need to modify the smaller stuff but are not changing the big data.  How 
can we avoid copying the big data?

(A use case might be some modeling of large data where we want to save 
various versions, all including the same original data but differing in 
some stored parameters, estimates, etc.)

A new kind of class, "local reference classes" has been added to r-devel 
(rev. 61562).  It's the idea that using these classes to represent data 
can avoid copying that's not needed, while retaining the standard R 
functional semantics, or close to that.  For a quick look, see 
?LocalReferenceClasses.

Here is the idea.

We imagine that our object has components/slots/attributes/fields 
"BigData", say, and "twiddle".  With normal R evaluation, replacing 
"twiddle" in the object will cause internal duplication of the whole 
thing, in the very likely case that we pass some object, myX say as 
argument x to a function.

As soon as the evaluator sees a replacement function, "@<-", "$<-" or 
"attr<-" for an ordinary object, the EnsureLocal routine calls 
duplicate() if the object has more than one reference, as it will in 
this scenario. And BigData gets copied.  I think it's important to 
understand that this follows from the "replacement function" concept in 
S and R:  A replacement function takes an object from the frame, does 
whatever it does, and returns a replacement for this object.  The 
evaluator doesn't know what the replacement function does, so the 
EnsureLocal strategy is inevitable.

There is one trapdoor, however.  duplicate() does essentially nothing 
for data types that are references, most importantly for environments. 
That's the basis for reference classes.

But a reference class is not exactly what we want here.  Our different 
models share the BigData but should not share the same other fields.  If 
I twiddle parameters in one model, it better not change another model. 
So it's R's standard "functional" semantics we want.

In fact, R is not strictly a functional language.  Rather it has the 
idea of "local references":  ordinary assignments change the references 
in the local frame but have no external effect.

Local reference classes implement essentially this using reference class 
fields.  Specifically, calling a method $ensureLocal() on an object, 
directly or via replacing a field, causes a *shallow* copy of the object 
to be created and remembered locally.  Subsequent replacements have no 
effect on the object passed in to the function.

The implementation is fairly simple, but the programmer does have to be 
aware of what's happening, to some extent.  Please look it over and play 
with it if it seems interesting.

John



More information about the R-devel mailing list