[Rd] cbind() & rbind() for S4 objects -- 'Matrix' package changes

Luke Tierney luke at stat.uiowa.edu
Wed Mar 21 02:27:18 CET 2007


On Tue, 20 Mar 2007, Martin Maechler wrote:

> As some of you may have seen / heard in the past,
> it is not possible to make cbind() and rbind() into proper S4
> generic functions, since their first formal argument is '...'.
> [ BTW: S3-methods for these of course only dispatch on the first
>  argument which is also not really satisfactory in the context
>  of many possible matrix classes.]
>
> For this reason, after quite some discussion on R-core (and
> maybe a bit on R-devel) about the options, since R-2.2.0 we have
> had S4 generic functions cbind2() and rbind2() (and default methods)
> in R's "methods" which are a version of cbind() and
> rbind() respectively for two arguments (x,y)
> {and fixed 'deparse.level = 0' : the argument names are 'x' and 'y' and
>  hence don't make sense to be used to construct column-names or
>  row-names for rbind(), respectively.}
>
> We have been defining methods for cbind2() and rbind2()
> for the 'Matrix' classes in late summer 2005 as well.  So far so
> good.
>
> In addition, [see also  help(cbind2) ],
> we have defined cbind() and rbind() functions which recursively call
> cbind2() and rbind2(), more or less following John Chambers
> proposal of dealing with such "(...)" argument functions.
> These new recursively defined cbind() / rbind() functions
> however have typically remained invisible in the methods package
> [you can see them via  methods:::cbind  or  methods:::rbind ]
> and have been ``activated'' --- replacing  base::cbind / rbind ---
> only via an explicit or implicit call to
>     methods:::bind_activation(TRUE)
>
> One reason I didn't dare to make them the default was that I
> noticed they didn't behave identically to cbind() / rbind() in
> all cases, though IIRC the rare difference was only in the dimnames
> returned; further, being entirely written in R, and recursive,
> they were slower than the mostly C-based fast  cbind() / rbind()
> functions.
>
> As some Bioconductor developers have recently found,
> these versions of cbind() and rbind() that have been
> automagically activated by loading the  Matrix package
> can have a detrimental effect in some extreme cases,
> e.g. when using
>     do.call(cbind, list_of_length_1000)
> because of the recursion and the many many calls to the S4
> generic, each time searching for method dispatch ...
> For the bioconductor applications and potentially for others using cbind() /
> rbind() extensively, this can lead to unacceptable performance
> loss just because loading 'Matrix' currently calls
>     methods:::bind_activation(TRUE)

The recursion part is potentially problematic because stack space
limitations will cause this to fail for "relatively" short
list_of_length_1000, but that should be easily curable by rewriting
methods:::cbind and methods:::rbind to use iteration rather than
recursion. This might also help a little with efficiency by avoiding
call overhead.  It would be interesting to know how much of the
performance hit is dispatch overhead and how much closure call
overhead.  If it's dispatch overhead then it may be worth figuring out
some way of handling this with internal dispatch at the C level (at
the cost of maintaining the C level stuff).

My initial reaction to scanning the methods:::cbind code is that it is
doing too much, at least too much R-level work, but I haven't thought
it through carefully.

> For this reason, we plan to refrain from doing this activation
> on loading of Matrix, but propose to
>
> 1)  define and export
> 	    cBind <- methods:::cbind
> 	    rBind <- methods:::cbind
>
>    also do this for R-2.5.0 so that other useRs / packages
>    can start cBind() / rBind() in their code when they want to
>    have something that can become properly object-oriented

In mackage methods?

> Possibly --- and this is the big  RFC (request for comments) ---
>
> 2) __ for 'Matrix' only __ also
>   define and export
> 	    cbind <- methods:::cbind
> 	    rbind <- methods:::cbind
>
> I currently see the possibilities of doing
> either '1)'
> or     '1) and 2)'
> or less likely  '2) alone'
>
> and like to get your feedback on this.
>
> "1)" alone would have the considerable drawback for current
>  Matrix useRs that their code / scripts which has been using
>  cbind() and rbind() for "Matrix" (and "matrix" and "numeric")
>  objects no longer works, but needs to be changed to use
> 	rBind() and cBind()  *instead*
>
> As soon as "2)" is done (in conjunction with "1)" or not),
> those who need a very fast but non-OO version of cbind() / rbind()
> need to call  base::cbind() or  base::rbind()  explicitly.
> This however would not be necessary for packages with a NAMESPACE
> since these import 'base' automagically and hence would use
> base::cbind() automagically {unless they also import(Matrix)}.
>
> We are quite interested in your feedback!

Either one seems cleaner to me than having loading of one package
result in mucking about in the internals of another.

If we are thinking of these as long term solutions then I think having
different names is cleaner, so 1) but not 2).  If we are thinking of
this as a transition towards making base::cbind and base::rbind
support S4 dispatch via cbind2/rbind2 (assuming this can be done
efficiently) then there may be some merit to 2) to minimize the need
for code rewriting.

It might be worth experimenting with having .Internal(cbind(...))
check its arguments and call methods:::cbind if (Methods is loaded
and) any of the arguments are S4 -- as the S4 property is now cheap to
determine that may be very low cost especially if done after the
object bits have been checked with positive result.

Best,

luke

> Martin Maechler and Doug Bates <Matrix-authors at R-project.org>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list