[R] S4 vs Reference Classes

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Sep 15 15:57:20 CEST 2011


Hi Joseph (and Martin),

Don't mean to beat a dead horse, but I wanted to add one last comment
to this thread in case someone stumbles upon this via google/gmane (or
you) and gives it a shot.

I neglected to mention a very important step that you'd have to do to
in order to avoid shooting yourself in the foot.

Martin, off list, thankfully pointed out to me that you still need to
define an "initialize" method for your class so that each @cache slot
for every new object defined gets *its own* environment. If you don't,
they all share the same environment when you create new objects
through a call to `new("Element")`.

Here's what happens and how to fix ... it's intentionally a bit
verbose for pedagogical purposes, so please bear with me:

R> setClass("Element",
 representation=representation(x='numeric', cache='environment'),
 prototype=prototype(x=numeric(), cache=new.env()))

R> a <- new("Element")
R> b <- new("Element")

If we look at the cache object in both `a` and `b`, you'll see that
they actually are *the same* environment:

R> a at cache
<environment: 0x100a23788>

R> b at cache
<environment: 0x100a23788>

See -- those two environments share the same address. So, if you do:

R> a at cache$some.var <- 42
R> a at cache$some.var
[1] 42

R> b at cache$some.var
[1] 42

¡Yikes!

If you explicitly set the cache slot to a `new.env()` you can avoid this:

R> a <- new("Element", cache=new.env())
R> b <- new("Element", cache=new.env())
R> a at cache
<environment: 0x10214d5b8>
R> b at cache
<environment: 0x100eff908>

You see the two environments are different, so setting a var into one
@cache won't affect the other:

R> a at cache$some.var <- 42
R> b at cache$some.var
NULL

So that's what you want, but who wants to keep typing new("Element",
cache=new.env())? Not me, so that's what initialize methods are for.
These are what the ones I have in my libs look like:

setMethod("initialize", "Element",
  function(.Object, ..., x=numeric(), cache=new.env()) {
    callNextMethod(.Object, x=x, cache=cache, ...)
})

Now, with those loaded up:

R> aa <- new("Element")
R> bb <- new("Element")
R> aa at cache
<environment: 0x10312e3f8>

R> bb at cache
<environment: 0x103251ae0>

Problem solved.

Martin suggested a slightly different version of "initialize", like so:

setMethod(initialize, "Element", function(.Object, ...) {
   callNextMethod(.Object, ..., cache=new.env(parent=emptyenv()))
})

Where he mentions "... with parent=emptyenv() to avoid searching
outside the cache during symbol look-up".

I actually never used that, and don't think I ran into problems (I
always set `inherits=FALSE` if I'm `get`-ing something out of an
environment), but I'd go with his advice over mine any day.

So ...

(i) thanks to Martin for pointing that out; and
(ii) thanks for bearing with me here,

I'll stop now :-)

-steve

On Wed, Sep 14, 2011 at 4:24 PM, Joseph Park <jpark.us at att.net> wrote:
> Thanks Steve.
>
> I'll take a closer look at this.
>
> all the best...
>
>
> On 9/14/2011 4:18 PM, Steve Lianoglou wrote:
>
> Hi,
>
> Just wanted to say that embedding a slot in your class that's an
> environment (as I shown earlier) will still solve your problem w/o you
> having to switch to Ref classes (since you've already done lots of
> work for your app in S4).
>
> Let's assume you have a slot `cache` that is an environment, using
> your latests examples, let's say it's like this:
>
> setClass("Element",
>  representation=representation(x='numeric', cache='environment'),
>  prototype=prototype(x=numeric(), cache=new.env()))
>
> Let's say "gradient" is something you want to be access by reference,
> you can have something like this (setGenerics left out for lack of
> time):
>
> setMethod("gradient", "Element", function(x, ...) {
>   if (!'gradient' %in% ls(x at cache)) {
>     x at cache$gradient <- calc.gradient.from.element(x)
>   }
>   x at cache$gradient
> })
>
> Then a call to `gradient(my.obj)` will return the gradient if it
> already calculated, or it will calc it on the fly and set it into your
> object (w/o copying your object) and return it when it's done.
>
> which is my issue. Without the reference-based approach an object
> in a slot which is then included in another object slot is a copy.
> An update to the original object slot then requires 'extra' code
> to update/synchronize the copy.
>
> Again, this "semi-s4-semi-ref-class" approach would run around this
> issue .. but life might get confusing to you (or your users) depending
> on what one expects as "normal" behavioR.
>
> Just wanted to try to clear up my original intention (if it wasn't
> clear before).
>
> -steve
>
>



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list