[R] performance gap between R 1.7.1 and 1.8.0

Thu Dec 4 19:20:06 CET 2003

> From: Martin Maechler [mailto:maechler at stat.math.ethz.ch] 
[...]
> A very first step of diagnosis might be to activate
>   trace(read.dcf)
>   trace(library)
>   options(verbose = TRUE)
> 
> A step further might be to patch read.dcf such that it prints
> info 
>        if(getOption("verbose")) { <<print what I am trying to read>> }
> 
> Martin

To refresh people's memory, the problem is that the identical code took more
than twice as long in R-1.8.x than in R-1.7.x.  I've managed to strip the
code down to the following and still see the same problem.  It basically
bootstraps mixture models to groups of data.  It uses boot() from the boot
package and flexmix() with intercept only, in the flexmix package.

=============================================
fitmix <- function(x, verbosefit=FALSE) {
  bpunlist <- function(x) {
    unlisted <- numeric()
    for(i in 1:length(x at components)) {
      unlisted <- c(unlisted, unlist(parameters(x, component=i)))
    }
    unlisted
  }
  unifit <-  flexmix(x ~ 1, k = 1, control=list(verbose=0))
  unlist(unlist(list(G = 1, proportion = unifit at prior,
                     param=matrix(bpunlist(unifit), nrow=2),
                     bic=BIC(unifit), loglik=unifit at logLik)))
}

bootmix <- function(data, i) {
  d <- data
  d.grps <- split.data.frame(d[i,], d[i, "id"])
  comps <- vector("list", length=length(d.grps))
  for (j in seq(along=comps)) {
    thefit <- try(fitmix(d.grps[[j]][,1], verbosefit=FALSE))
    comps[[j]] <- if (inherits(thefit, "try-error")) rep(NA, 48) else thefit
  }
  return(unlist(comps))
}

## The actual commands:
require(flexmix)
require(boot)
set.seed(76421)
x <- rnorm(3e3)
id <- gl(10, 300)
mydf <- data.frame(x, id)
Rprof(filename=paste("Rprof.out", system("hostname -s", int=TRUE), sep="."))
system.time(res <- boot(mydf, bootmix, strata=mydf[,"id"], R=5))
Rprof(NULL)
summaryRprof(filename=paste("Rprof.out", system("hostname -s", int=TRUE),
               sep="."))$by.self[1:20,]
=============================================

I ran this on a Xeon 2.4GHz running Mandrake 9.0, R compiled from source.

This took 2 seconds in R-1.7.1, with the output:

                  self.time self.pct total.time total.pct
"FUN"                  0.18        9       0.24        12
"names"                0.16        8       0.16         8
"initialize"           0.10        5       0.80        40
"apply"                0.08        4       0.30        15
"[.data.frame"         0.06        3       0.22        11
"inherits"             0.06        3       0.18         9
"is.null"              0.06        3       0.06         3
"match"                0.06        3       0.22        11
"seq.default"          0.06        3       0.06         3
"structure"            0.06        3       0.08         4
"any"                  0.04        2       0.14         7
"el"                   0.04        2       0.04         2
"FLXfit"               0.04        2       1.18        59
"lapply"               0.04        2       0.22        11
"lm.wfit"              0.04        2       0.16         8
"names<-.default"      0.04        2       0.04         2
"slot<-"               0.04        2       0.18         9
"<"                    0.02        1       0.02         1
"|"                    0.02        1       0.02         1
":"                    0.02        1       0.02         1

... and took over 7 seconds in R-1.8.1, with the output:

                self.time self.pct total.time total.pct
"paste"              0.52      6.7       1.10      14.1
"read.dcf"           0.38      4.9       0.96      12.3
"exists"             0.36      4.6       0.50       6.4
"lapply"             0.36      4.6       1.36      17.4
"names"              0.34      4.4       0.40       5.1
"names<-"            0.32      4.1       0.44       5.6
"inherits"           0.22      2.8       0.34       4.4
"=="                 0.20      2.6       0.20       2.6
".Call"              0.20      2.6       0.20       2.6
"seq"                0.18      2.3       0.50       6.4
"seq.default"        0.18      2.3       0.28       3.6
"dynGet"             0.16      2.1       0.46       5.9
"topenv"             0.16      2.1       0.18       2.3
"unique"             0.16      2.1       1.00      12.8
"apply"              0.14      1.8       0.20       2.6
"as.list"            0.14      1.8       0.18       2.3
".find.package"      0.14      1.8       1.88      24.1
"sapply"             0.14      1.8       1.54      19.7
"any"                0.12      1.5       1.68      21.5
"initialize"         0.12      1.5       4.98      63.8

I then did the following:

> trace(read.dcf, recover)
[1] "read.dcf"
> res <- boot(mydf, bootmix, strata=mydf[,"id"], R=1)
Tracing read.dcf(file.path(package.lib, package, "DESCRIPTION"), fields =
"Namespace") on entry 

Enter a frame number, or 0 to exit   
1:boot(mydf, bootmix, strata = mydf[, "id"], R = 1) 
2:statistic(data, original, ...) 
3:try(fitmix(d.grps[[j]][, 1], verbosefit = FALSE)) 
4:fitmix(d.grps[[j]][, 1], verbosefit = FALSE) 
5:flexmix(x ~ 1, k = 1, control = list(verbose = 0)) 
6:flexmix(x ~ 1, k = 1, control = list(verbose = 0)) 
7:flexmix(formula = formula, data = data, k = k, cluster = cluster, model =
list(F 
8:FLXglm() 
9:new("FLXmodel", weighted = TRUE, formula = formula, name = paste("FLXglm",
famil 
10:initialize(value, ...) 
11:initialize(value, ...) 
12:getClass(Class) 
13:.requirePackage(package) 
14:trySilent(loadNamespace(package)) 
15:eval.parent(call) 
16:eval(expr, p) 
17:eval(expr, envir, enclos) 
18:try(loadNamespace(package)) 
19:loadNamespace(package) 
20:packageHasNamespace(package, package.lib) 
21:read.dcf(file.path(package.lib, package, "DESCRIPTION"), fields =
"Namespace") 

I'm really out of ideas at this point.  Can anyone see the problem given the
above, or at least tell me what to try next?

Best,
Andy