[R] inheritence in S4

Thu Mar 27 17:26:48 CET 2008

cgenolin at u-paris10.fr wrote:
> Sorry to come back on callNextMethod, I am still not very confident 
> about it.
> 
> Consideres the following (there is a lot of code, but very simple with 
> almost only some cat) :
> 
> ------------------
> setClass("A",representation(a="numeric"))
> setValidity("A",function(object){cat(" ***** Valid A *****\n");TRUE})
> setMethod("initialize","A",function(.Object){
>    cat("****** Init  A ******\n")
>    .Object <- callNextMethod()
>    return(.Object)
> })
> 
> setClass("B",representation(b="numeric"),contains="A")
> setValidity("B",function(object){cat("   *** Valid B ***\n");TRUE})
> setMethod("initialize","B",function(.Object){
>    cat("  **** Init  B ****\n")
>    .Object <- callNextMethod()
>    return(.Object)
> })
> new("B",a=3,b=2)
> ######## Result ########
> #   **** Init  B ****
> # ****** Init  A ******
> #  ***** Valid A *****
> #    *** Valid B ***
> # An object of class "B"
> # Slot "b":
> # [1] 2
> # # Slot "a":
> # [1] 3
> 
> --------------------
> 
> new("B") will go trought
> - initialize B that will call the nextMethod that is :
> - initialize A that will call the nextMethod that is :
> - initialize ANY call validObject A.
> This would be perfect... But there is also a call to validObject B. 
> Where does it come from ?

You're creating a "B" object, so validObject is called on B. This 
probably makes you wonder why, then, validObject is being called on A. 
And the answer to this is that the validity method of 'B' is responsible 
only for the unique aspects of the object that relate to B. validObject 
A has to be called so that the parts of B that are inheritted from A can 
be checked.

> 
> This is anoying because :
>> In an object-oriented sense, initialize,B-method should really just 
>> deal with it's own slots; it shouldn't have to 'know' about either 
>> classes that it extends (A) or classes that extend it.
> 
> I completly agree with that. But if the author of A change its code :
> 
> ---------------------
> setClass("A",representation(a="numeric"))
> setValidity("A",function(object){cat(" ***** Valid A *****\n");TRUE})
> setMethod("initialize","A",function(.Object){
>    cat("****** Init  A ******\n")
>    .Object at a <- 10
>    return(.Object)
> })
> 
> setClass("B",representation(b="numeric"),contains="A")
> setValidity("B",function(object){cat("   *** Valid B ***\n");TRUE})
> setMethod("initialize","B",function(.Object){
>    cat("  **** Init  B ****\n")
>    .Object <- callNextMethod()
>    return(.Object)
> })
> new("B",a=3,b=2)
> ######## Result ########
> #   **** Init  B ****
> # ****** Init  A ******
> # An object of class "B"
> # Slot "b":
> # numeric(0)
> #
> # Slot "a":
> # [1] 10
> 
> ---------------------
> 
> Then validObject of B is no longer call, and B is no longueur correctly 
> set...

Yes this would be a bad thing for the author of A to do (in my opinion).

> So if A is changed by its author, the comportement of B is change as 
> well...
> 
> Anoying, isn't it ?

Yes, but in any object oriented system you're relying on inherited 
methods to fulfill a contract. If they change the contract (e.g., no 
longer guaranteeing that slots will be populated and validity checked), 
then downstream classes have to change.

> But I agree with the
>> "initialize,B-method should really just deal with it's own slots;"
> So may be something like :
> ---------------------
> setClass("A",representation(a="numeric"))
> setValidity("A",function(object){cat(" ***** Valid A *****\n");TRUE})
> setMethod("initialize","A",function(.Object){
>    cat("****** Init  A ******\n")
> #    .Object at a <- 10
>    .Object <- callNextMethod()
>    return(.Object)
> })

Here you're implementing part of the functionality of the default method 
(populating slots) so this code duplication is not very good practice 
(in my opinion).

> setClass("B",representation(b="numeric"),contains="A")
> setValidity("B",function(object){cat("   *** Valid B ***\n");TRUE})
> setMethod("initialize","B",function(.Object,a,b){
>    cat("  **** Init  B ****\n")
>    as(.Object,"A") <- new("A",a=a)
>    .Object at b <- b
>    return(.Object)
> })
> new("B",a=3,b=2)
> ######## Result ########
> #   **** Init  B ****
> # ****** Init  A ******
> # An object of class "B"
> # Slot "b":
> # [1] 2
> #
> # Slot "a":
> # [1] 10
> 
> ---------------------
> The call to validObject of B is no longer dependent of the A code.

You're free to do what you like, of course. This replicates 
functionality of the default method (slot assignment) and does it in an 
inefficient way (making unnecessary copies of .Object; this matters when 
real-world objects are large). There is no validity checking, and to 
ensure that you'd have to add to your paradigm that all initialize 
methods call validObject. Because of the way validObject is implemented, 
you'll end up evaluating it multiple times for each construction of B. 
Lack of ... in the argument list means that derived classes must use 
your convention for object initialization, so you've replaced one 
(semi-established) convention with another. A close reading of 
initialize shows that the contract is more complicated than what we've 
talked about, with unnamed arguments used to initialize classes that the 
object extends and with a kind of copy-constructor functionality. You'll 
have to modify your paradigm further to accommodate these, or change the 
contract of your initialize method relative to those documented for S4. 
Again, you can adopt these conventions if you find them useful.

I know the above classes are just examples. But it's worth pointing out 
that the basic operation of initializing classes from named slots 
actually requires NO initialize method for the class -- this is all 
performed by the default method. As I've gained experience, I've 
actually found that my real classes tend NOT to have initialize methods, 
or to have initialize methods that are much simpler than they were at an 
earlier point in my understand. It's letting the existing software do 
the work for you.

Martin

> Christophe
> 
> 
>> cgenolin at u-paris10.fr wrote:
>>> Hi Martin
>>>
>>> I am re reading all the mail we exchange with new eyes because of all 
>>> the thing I learn in the past few weeks. That very interesting and 
>>> some new question occurs...
>>>
>>> ***********************************
>>> Once, you speak about callGeneric :
>>>
>>> setClass("A", representation(x="numeric"))
>>> setClass("C", contains=c("A"))
>>>
>>> setMethod("show", "A", function(object) cat("A\n"))
>>> setMethod("show", "C", function(object) {
>>>   callGeneric(as(object, "A"))
>>>   cat("C\n")
>>> })
>>>
>>> new("C")
>>>
>>> Considere the following definition (that you more or less teach me 
>>> with your yesterday remarques...) :
>>>
>>> setMethod("show", "C", function(object) {
>>>   callNextMethod()
>>>   cat("C\n")
>>> })
>>>
>>> In this case, is there any difference between the former and the 
>>> latter ?
>>> Which one would you use ?
>>
>> callNextMethod is the right thing to do for this case. callGeneric is 
>> useful in a very specific case -- when dispatching from within a 
>> so-called 'group generic' function. But this is an advanced topic.
>>
>>> (I get that in more complicate case, for example if
>>> setClass("C", contains=c("A","B")), it might be more complicate to 
>>> use the latter, right ?)
>>
>> The right thing to do in this case is to sit down with the rules of 
>> method dispatch, and figure out what the 'next' method will be. A 
>> common alternative paradigm is to have a plain function (not visible 
>> to the user, e.g., not exported in a package name space) that several 
>> different methods all invoke, after mapping their arguments 
>> appropriately. The methods provide a kind of structured interface to 
>> the function, making sure arguments are of the appropriate type, etc. 
>> The function does the work, confident that the arguments are appropriate.
>>
>>
>>> *************************
>>> This works :
>>>
>>> setMethod("initialize","B",
>>>          function(.Object,..., yValue){
>>>              callNextMethod(.Object, ..., y=yValue)
>>>              return(.Object)
>>>          })
>>> new("B",yValue=3)
>>>
>>> but this does not :
>>>
>>> setMethod("initialize","B",
>>>          function(.Object, yValue){
>>>              callNextMethod(.Object, y=yValue)
>>>              return(.Object)
>>>          })
>>> new("B",yValue=3)
>>>
>>> Why ?
>>> Is there any help page about ... ?
>>
>> Both 'work' in the sense that an object is returned (by the way, no 
>> need to use 'return' explicitly). And actually the examples on some of 
>> the man pages do not include '...', so this is really my opinion 
>> rather than the 'right' way to do things.
>>
>> In an object-oriented sense, initialize,B-method should really just 
>> deal with it's own slots; it shouldn't have to 'know' about either 
>> classes that it extends (A) or classes that extend it. And it 
>> shouldn't do work that inherited methods (i.e., initialize,ANY-method) 
>> do. In the second form above, without the ..., there is no way for the 
>> initialize,A-method to see arguments that might be relevant to it 
>> (e.g., values to be used to initialize its slots). So 
>> initialize,B-method would have to do all the work of initializing A. 
>> This is not good design.
>>
>>
>>> **************************
>>> showMethods gives the list of all the method. Is there a way to see 
>>> all the method for a specific signature IN THE ORDER they will be 
>>> call by callNextMethod ?
>>> If ANY <- D <- E, a method that will gives :
>>>
>>> Function "initialize":
>>> .Object = "E"
>>> .Object = "D"
>>> .Object = "ANY"
>>
>> There is, but I have never been able to figure it out in detail or to 
>> feel confident that I was using functions that were meant to be used 
>> for this purpose by the user (as opposed to by the methods package).
>>
>> John Chambers posted recently to the R-devel mailing list about 
>> changes to the internal representation of methods and classes.
>>
>> https://stat.ethz.ch/pipermail/r-devel/2008-March/048729.html
>>
>> I have not explored the new functions Dr. Chambers mentions; to use 
>> them requires the 'devel' version of R, not 2.6.2. Any questions they 
>> generate should definitely be addressed to the R-devel mailing list.
>>
>> Best,
>>
>> Martin
>>
>>> Thanks for your help
>>> And happy easter eggs !
>>>
>>> Christophe
>>>
>>>
>>> ----------------------------------------------------------------
>>> Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre
>>>
>>>
>>>
>>
>>
> 
> 
> 
> ----------------------------------------------------------------
> Ce message a ete envoye par IMP, grace a l'Universite Paris 10 Nanterre
> 
> 
> 

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793