[Rd] cbind() & rbind() for S4 objects -- 'Matrix' package changes

Martin Maechler maechler at stat.math.ethz.ch
Fri Mar 23 18:31:35 CET 2007


Thank you, Luke, for your feedback,

   (more inline below)

>>>>> "Luke" == Luke Tierney <luke at stat.uiowa.edu>
>>>>>     on Tue, 20 Mar 2007 20:27:18 -0500 (CDT) writes:

    Luke> On Tue, 20 Mar 2007, Martin Maechler wrote:

    >> As some of you may have seen / heard in the past,
    >> it is not possible to make cbind() and rbind() into proper S4
    >> generic functions, since their first formal argument is '...'.
    >> [ BTW: S3-methods for these of course only dispatch on the first
    >> argument which is also not really satisfactory in the context
    >> of many possible matrix classes.]
    >> 
    >> For this reason, after quite some discussion on R-core (and
    >> maybe a bit on R-devel) about the options, since R-2.2.0 we have
    >> had S4 generic functions cbind2() and rbind2() (and default methods)
    >> in R's "methods" which are a version of cbind() and
    >> rbind() respectively for two arguments (x,y)
    >> {and fixed 'deparse.level = 0' : the argument names are 'x' and 'y' and
    >> hence don't make sense to be used to construct column-names or
    >> row-names for rbind(), respectively.}
    >> 
    >> We have been defining methods for cbind2() and rbind2()
    >> for the 'Matrix' classes in late summer 2005 as well.  So far so
    >> good.
    >> 
    >> In addition, [see also  help(cbind2) ],
    >> we have defined cbind() and rbind() functions which recursively call
    >> cbind2() and rbind2(), more or less following John Chambers
    >> proposal of dealing with such "(...)" argument functions.
    >> These new recursively defined cbind() / rbind() functions
    >> however have typically remained invisible in the methods package
    >> [you can see them via  methods:::cbind  or  methods:::rbind ]
    >> and have been ``activated'' --- replacing  base::cbind / rbind ---
    >> only via an explicit or implicit call to
    >> methods:::bind_activation(TRUE)
    >> 
    >> One reason I didn't dare to make them the default was that I
    >> noticed they didn't behave identically to cbind() / rbind() in
    >> all cases, though IIRC the rare difference was only in the dimnames
    >> returned; further, being entirely written in R, and recursive,
    >> they were slower than the mostly C-based fast  cbind() / rbind()
    >> functions.
    >> 
    >> As some Bioconductor developers have recently found,
    >> these versions of cbind() and rbind() that have been
    >> automagically activated by loading the  Matrix package
    >> can have a detrimental effect in some extreme cases,
    >> e.g. when using
    >> do.call(cbind, list_of_length_1000)
    >> because of the recursion and the many many calls to the S4
    >> generic, each time searching for method dispatch ...
    >> For the bioconductor applications and potentially for others using cbind() /
    >> rbind() extensively, this can lead to unacceptable performance
    >> loss just because loading 'Matrix' currently calls
    >> methods:::bind_activation(TRUE)

    Luke> The recursion part is potentially problematic because stack space
    Luke> limitations will cause this to fail for "relatively" short
    Luke> list_of_length_1000, but that should be easily curable by rewriting
    Luke> methods:::cbind and methods:::rbind to use iteration rather than
    Luke> recursion.

Yes, thank you, Luke,  something I should have at least tried
doing but didn't get to.

    Luke> This might also help a little with efficiency by avoiding
    Luke> call overhead.  It would be interesting to know how much of the
    Luke> performance hit is dispatch overhead and how much closure call
    Luke> overhead.  If it's dispatch overhead then it may be worth figuring out
    Luke> some way of handling this with internal dispatch at the C level (at
    Luke> the cost of maintaining the C level stuff).

    Luke> My initial reaction to scanning the methods:::cbind code is that it is
    Luke> doing too much, at least too much R-level work, but I haven't thought
    Luke> it through carefully.

I can well understand that reaction..
The code started up quite small and easily understandable ...
until I started trying to emulate the "standard"  cbind() /
rbind() behavior, notably about automatic colnames / rownames
creation. When delving into that the code got the current
partial messyness....

    >> For this reason, we plan to refrain from doing this activation
    >> on loading of Matrix, but propose to
    >> 
    >> 1)  define and export
    >> cBind <- methods:::cbind
    >> rBind <- methods:::cbind
    >> 
    >> also do this for R-2.5.0 so that other useRs / packages
    >> can start cBind() / rBind() in their code when they want to
    >> have something that can become properly object-oriented

    Luke> In mackage methods?

yes, inside methods, such that in fact there  cBind <- cbind
(plus namespace export) would be sufficient.

In the mean time I'm less sure if this is desirable; 
but at least this would ``expose'' the code and have it used
and consequently hopefully made more efficient (and hopefully
even simplified).

    >> Possibly --- and this is the big  RFC (request for comments) ---
    >> 
    >> 2) __ for 'Matrix' only __ also
    >> define and export
    >> cbind <- methods:::cbind
    >> rbind <- methods:::cbind
    >> 
    >> I currently see the possibilities of doing
    >> either '1)'
    >> or     '1) and 2)'
    >> or less likely  '2) alone'

    >> and like to get your feedback on this.

    >> "1)" alone would have the considerable drawback for current
    >> Matrix useRs that their code / scripts which has been using
    >> cbind() and rbind() for "Matrix" (and "matrix" and "numeric")
    >> objects no longer works, but needs to be changed to use
    >> rBind() and cBind()  *instead*
    >> 
    >> As soon as "2)" is done (in conjunction with "1)" or not),
    >> those who need a very fast but non-OO version of cbind() / rbind()
    >> need to call  base::cbind() or  base::rbind()  explicitly.
    >> This however would not be necessary for packages with a NAMESPACE
    >> since these import 'base' automagically and hence would use
    >> base::cbind() automagically {unless they also import(Matrix)}.
    >> 
    >> We are quite interested in your feedback!

    Luke> Either one seems cleaner to me than having loading of one package
    Luke> result in mucking about in the internals of another.

I agree {{the reason I had chosen the unclean approach was the
	  hope that methods:::cbind could be improved to soon replace
	  base::cbind -- and so "Matrix" would just do something
	  that would happen more universally in the future anyway}}

    Luke> If we are thinking of these as long term solutions then I think having
    Luke> different names is cleaner, so 1) but not 2).  If we are thinking of
    Luke> this as a transition towards making base::cbind and base::rbind
    Luke> support S4 dispatch via cbind2/rbind2 (assuming this can be done
    Luke> efficiently) then there may be some merit to 2) to minimize the need
    Luke> for code rewriting.

Yes, exactly.  
My originally intent was strongly in the direction of making
base::cbind support S4 via cbind2() and rbind2() -- and one will
continue to be able to experiment with that after calling
methods:::bind_activation(TRUE).
The problem I had underestimated is 
    Luke> assuming this can be done efficiently

    Luke> It might be worth experimenting with having .Internal(cbind(...))
    Luke> check its arguments and call methods:::cbind if (Methods is loaded
    Luke> and) any of the arguments are S4 -- as the S4 property is now cheap to
    Luke> determine that may be very low cost especially if done after the
    Luke> object bits have been checked with positive result.

Very nice idea ... currently I don't have the time to do it, so,
unless another R-core member (or an avid R-devel reader) jumps
in, I don't see this possible for R 2.5.0 (and hence the 2.5.x
series).

Too bad that nobody else (on R-devel) seems interested enough.
In the mean time, I'm proposing to implement '2)'
which will give

>>    > library(Matrix)
>>    Loading required package: lattice
>> 
>>    Attaching package: 'Matrix'
>> 
>> 
>> 	   The following object(s) are masked from package:base :
>> 
>> 	    cbind,
>> 	    rbind 

every time Matrix is loaded, but that seems appropriate to me.

The question remains if it wasn't worth to do '1)' as well
{not in Matrix, but the 'methods' package} such that people can
more easily get experience with such a  cbind2/rbind2 - based
version of cbind/rbind.

Martin


    Luke> Best,

    Luke> luke

    >> Martin Maechler and Doug Bates <Matrix-authors at R-project.org>

    Luke> -- 
    Luke> Luke Tierney
    Luke> Chair, Statistics and Actuarial Science
    Luke> Ralph E. Wareham Professor of Mathematical Sciences
    Luke> University of Iowa                  Phone:             319-335-3386
    Luke> Department of Statistics and        Fax:               319-335-3017
    Luke> Actuarial Science
    Luke> 241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
    Luke> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list