[R] Google's R Style Guide (has become S3 vs S4, in part)

Martin Maechler maechler at stat.math.ethz.ch
Tue Sep 8 11:59:15 CEST 2009


>>>>> Martin Morgan <mtmorgan at fhcrc.org>
>>>>>     on Tue, 01 Sep 2009 09:07:05 -0700 writes:

    > spencerg wrote:
    >> Bryan Hanson wrote:
    >>> Looks like the discussion is no longer about R Style, but S3 vs S4?

    > yes nice topic rename!

    >>> 
    >>> To that end, I asked more or less the same question a few weeks ago,
    >>> arising
    >>> from the much the same motivations.  The discussion was helpful,
    >>> here's the
    >>> link: 
    >>> http://www.nabble.com/Need-Advice%3A-Considering-Converting-a-Package-from-S
    >>> 
    >>> 3-to-S4-tc24901482.html#a24904049
    >>> 
    >>> For what it's worth, I decided, but with some ambivalence, to stay
    >>> with S3
    >>> for now and possibly move to S4 later.  In the spirit of S4, I did
    >>> write a
    >>> function that is nearly the equivalent of validObject for my S3 object of
    >>> interest.
    >>> 
    >>> Overall, it looked like I would have to spend a lot of time moving to S4,
    >>> while staying with S3 would allow me to get the project done and get
    >>> results
    >>> going much faster (see Frank Harrell's comment in the thread above).

    > Bryan's original post started me thinking about this, but I didn't
    > respond. I'd classify myself as an 'S4' 'expert', with my ignorance of
    > S3 obvious from Duncan's corrections to my earlier post. It's hard for
    > me to make a comparative statement about S3 vs. S4, and hard really to
    > know what is 'hard' for someone new to S4, to R, to programming, ... I
    > would have classified most of the responses in that thread as coming
    > from 'S3' 'experts'.

    >>> As a concrete example (concrete for us non-programmers,
    >>> non-statisticians),
    >>> I recently decided that I wanted to add a descriptive piece of text to a
    >>> number of my plots, and it made sense to include the text with the
    >>> object.
    >>> So I just added a list element to the existing S3 object, e.g.
    >>> Myobject$descrip  No further work was necessary, I could use it right
    >>> away.
    >>> If instead, if I had made Myobject an S4 object, then I would have to go
    >>> back, redefine the object, update validObject, and possibly write some
    >>> new
    >>> accessor and definitely constructor functions.  At least, that's how I
    >>> understand the way one uses S4 classes.

    > This is a variant of Gabor's comment, I guess, that it's easy to modify
    > S3 on an as-needed basis. In S3, forgoing any pretext of 'best
    > practices', one might

    > s3 <- structure(list(x=1:10, y=10:1), class="MyS3Object")
    > ## some lines of code...
    > if (aTest)
    > s3$descraption <- "A description"

    > (either 'description' or 'discraption' is a typo, uncaught by S3).

    > In S4 I'd have to change my class definition from

    > setClass("MyS4Object", representation(x="numeric", y="numeric"))

    > to

    > setClass("MyS4Object", representation(x="numeric", y="numeric",
    > description="character"))

    > but the body of the code would look surprising similar

    > s4 <- new("MyS4Object", x=1:10, y=10:1)
    > ## some lines of code...
    > if (aTest)
    > s4 at description <- "A description"

    > (no typo, because I'd have been told that the slot 'discraption' didn't
    > exist). In the S3 case the (implicit) class definition is a single line,
    > perhaps nested deep inside a function. In S4 the class definition is in
    > a single location.

    > Best practices might make me want to have a validity method (x and y the
    > same dimensions? 'description' of length 1?), to use a constructor and
    > accessors (to provide an abstraction to separate the interface from its
    > implementation), etc., but those issues are about best practices.

    > A downstream consequence is that s4 always has a 'description' slot
    > (perhaps initialized with an appropriate default in the 'prototype'
    > argument of setClass, but that's more advanced), whereas s3 only
    > sometimes has 'description'. So I'm forced to check
    > is.null(s3$description) whenever I'm expecting a character vector.

    >> It doesn't stop there:  If you keep the same name for your
    >> redefined S4 class, I don't know what happens when you try to access
    >> stored objects of that class created before the change, but it might not
    >> be pretty.  If you give your redefined S4 class a different name, then

    > Actually, the old object is loaded in R. It is not valid
    > (validObject(originalS4) would complain about 'slots in class definition
    > not in object'). One might write an 'updateObject' generic and method
    > that detects and corrects this. This contrasts with S3, where there is
    > no knowing whether the object is consistent with the current (implicit)
    > class definition.

    >> you have a lot more code to change before you can use the redefined
    >> class like you want.

    > For slot addition, this is not true -- old code works fine. For slot
    > removal / renaming, this is analogous to S3 -- code needs reworking; use
    > of accessors might help isolate code using the class from the
    > implementation of the class.

    > A couple of comments on Duncan's

    > S3Foo <- function(x=numeric(), y=numeric()) {
    > structure(list(x=as.numeric(x), y=as.numeric(y)), class="S3Foo")
    > }

    > I used makeS3Foo to emphasize that it was a constructor, but in my own
    > code I use S3Foo(). Realizing that, as Henrik has now also pointed out,
    > I'm far from perfect, the use of as.numeric() combines validity checking
    > and coercion, which I think is not usually a good thing (even when
    > efficient). In particular this

    > as.numeric(factor(c("one", "two", "three")))

    > might unintentionally propagate earlier mistakes, e.g., after read.table
    > converts characters to factors behind the unexpecting user's back.

    > Martin

Very, very well put, Martin!

As another S4 lover and expert (who still uses S3 for older or very simple
projects), I do wholeheartedly agree with Martin's statements,
notably his points about the partially implied consistency of S4
classes, and the point that adding a slot to an S4 class is very
comparable in work to adding an informal element to an (always only
informal) S3 class object.

Martin Maechler, ETH Zurich.



    >> By contrast, with S3, if you have any code that tests the number of
    >> components in a list, that will have to be changed.
    >> 
    >> Spencer

    >>> Back to trying to get something done!  Bryan
    >>> *************
    >>> Bryan Hanson
    >>> Professor of Chemistry & Biochemistry
    >>> DePauw University, Greencastle IN USA
    >>> 
    >>> 
    >>> 
    >>> 
    >>> 
    >>> On 9/1/09 6:16 AM, "Duncan Murdoch" <murdoch at stats.uwo.ca> wrote:
    >>> 
    >>> 
    >>>> Corrado wrote:
    >>>> 
    >>>>> Thanks Duncan, Spencer,
    >>>>> 
    >>>>> To clarify, the situation is:
    >>>>> 
    >>>>> 1) I have no reasons to choose S3 on S4 or vice versa, or any other
    >>>>> coding
    >>>>> convention
    >>>>> 2) Our group has not done any OO developing in R and I would be the
    >>>>> first, so
    >>>>> I can set up the standards
    >>>>> 3) I am starting from scratch with a new package, so I do not have
    >>>>> any code I
    >>>>> need to re-use.
    >>>>> 4) I am an R OO newbie, so whatever I can learn from the beginning
    >>>>> what is
    >>>>> better and good for me.
    >>>>> 
    >>>>> So the questions would be two:
    >>>>> 
    >>>>> 1) What coding style guide should we / I follow? Is the google style
    >>>>> guide
    >>>>> good, or is there something better / more prescriptive which makes our
    >>>>> research group life easier?
    >>>>> 
    >>>> I don't think I can answer that.  I'd recommend planning to spend some
    >>>> serious time on the decision, and then go by your personal impression.
    >>>> S4 is definitely harder to learn but richer, so don't make the decision
    >>>> too quickly.  Take a look at John Chamber's new book, try small projects
    >>>> in each style, etc.
    >>>> 
    >>>> 
    >>>>> 2) What class type should I use? From what you two say, I should use S3
    >>>>> because is easier to use .... what are the disadvantages? Is there an
    >>>>> advantages / disadvantages table for S3 and S4 classes?
    >>>>> 
    >>>> S3 is much more limited than S4.  It dispatches on just one argument, S4
    >>>> can dispatch on several.  S3 allows you to declare things to be of a
    >>>> certain class with no checks that anything will actually work; S4 makes
    >>>> it easier to be sure that if you say something is of a certain class, it
    >>>> really is.  S4 hides more under the hood: if you understand how regular
    >>>> R functions work, learning S3 is easy, but there's still a lot to learn
    >>>> before you'll be able to use S4 properly.
    >>>> 
    >>>> Duncan Murdoch




More information about the R-help mailing list