[Rd] Extreme slowdown with named vectors. A bug?

Henrik Bengtsson hb at stat.berkeley.edu
Sat Oct 7 00:20:15 CEST 2006


Tried the following with R --vanilla on the Rv2.4.0 release (see
details at the end).  I think the script and its comments speaks for
itself, but the outcome is certainly not wanted.

for (n in 58950:58970) {
  cat("n=", n, "\n", sep="");

  # Clean up first
  rm(names, x, y); gc();

  # Create a named vector of length n
  # Try with format "%5d" and it works
  names <- sprintf("%05d", 1:n);
  x <- seq(along=names);
  names(x) <- names;

  # Extract the first k elements
  k <- 36422;
  t0 <- system.time({
    y <- x[names[1:k]];
  })
  str(y);

  # But with one more it takes
  # for ever when n >= 58960
  k <- k + 1;
  t1 <- system.time({
    y <- x[names[1:k]];
  })
  # ...then t1/t0 ~= 300-500 and growing!
  print(t1/t0);
  str(y);
}


The interesting this is that if you replace

 y <- x[names[1:k]];

with

 idxs <- match(names[1:k], names(x));
 y <- x[idxs];

everything is fine.

(For those working with the Affy 100K SNP chips, the freaky thing is
that the problem occurs at n = 58960 which is exactly the number of
SNPs on the Xba array; that's how I found out about the bug/feature it
the first place).

Tried this on two different systems:

> sessionInfo()
R version 2.4.0 (2006-10-03)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
[7] "base"

> sessionInfo()
R version 2.4.0 (2006-10-03)
x86_64-unknown-linux-gnu
locale:
C
attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
[7] "base"

Cheers

/Henrik




More information about the R-devel mailing list