[R] Re: [S] scalability

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Mar 27 22:32:35 CET 2004


On Sat, 27 Mar 2004, Patrick Burns wrote:

> I think this is an interesting discussion -- I've learned from both
> Steve's and Brian's comments, and I'm broadening it to R-help
> since I think others will be interested as well.
> 
> The problem up for comment is:
> 
> result <- apply(array.3D, 1:2, sum)
> 
> Where array.3D is 3000 by 300 by 3.

...

> Prof Brian Ripley wrote:
> 
> BR> There are almost always pros and cons with these issues.  S's sum() is an 
> BR> S4 generic whereas R's is internal *unless* you define an S4 method for 
> BR> it (which S-PLUS has already done).  S needs to create several frames for 
> BR> what is a nested set of function calls -- 1280b looks modest for that.
> BR> 
> BR> Also, S has an ability to back out calculations that R does not, and that 
> BR> costs memory (and can have benefits).
> BR> 
> BR> We know there are overheads in making functions generic, especially 
> BR> S4-generic, but then there are benefits too.  I am not sure designers who 
> BR> add features take enough account of the costs.
> 
> Using R 1.8.1 (precompiled) on SuSe Linux with a Xeon 2.4GHz and 1G of 
> memory:
> 
> set.seed(2)
> jja <- array(rnorm(3000*300*3), c(3000, 300, 3))
> gc()
> system.time(jjsa <- apply(jja, 1:2, sum)) # takes 30 seconds
> 
> sumS3 <- function(x, ...) UseMethod("sumS3")
> sumS3.default <- function(x, ...) sum(x, ...)
> gc()
> system.time(jjsa3 <- apply(jja, 1:2, sumS3)) # takes 65 seconds

sum is already S3-generic in R, at C level.  So a simple wrapper would be
a better test.  BTW, repeating this speeds things up quite a bit as the gc
limits get tuned.  I get (Athlon 2600)  23-23 secs basic, 23-25 secs for a
simple wrapper and 49 secs for sumS4.

> sumS4 <- function(x, ...) standardGeneric("sumS4")
> setMethod("sumS4", signature(x="numeric"), function(x, ...) sum(x, ...))
> gc()
> system.time(jjsa4 <- apply(jja, 1:2, sumS4)) # takes 58 seconds
> 
> Questions:
> 
> It looks to me like the penalty for making the functions generic is
> similar to one extra function call.  Does the penalty grow as there
> are more methods?  

Yes, probably quite a lot.  AFAIK there is no caching of selected methods 
going on, although it is hard to be sure.

> Are there other types of penalties for making
> a function generic?

Memory usage.  If you put gcinfo(T) you will see the cons cell usage 
growing during the run.

> Is the test with sumS4 still an unfair comparison with S-PLUS?

Yes, somewhat.  You only have one method.

> Are things better with S-PLUS 6.2?

Apparently not.  Even calling the default method directly seems very slow.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list