[Rd] Extreme slowdown with named vectors. A bug? (PR#9280)

murdoch at stats.uwo.ca murdoch at stats.uwo.ca
Sat Oct 7 03:10:54 CEST 2006


On 10/6/2006 6:20 PM, Henrik Bengtsson wrote:
> Tried the following with R --vanilla on the Rv2.4.0 release (see
> details at the end).  I think the script and its comments speaks for
> itself, but the outcome is certainly not wanted.

I see a similar effect to what you're describing.  I also see it in 
2.3.1, so it's not a new bug.

I tracked it down to an overflow occurring in the stringSubscript 
function in src/main/subscript.c:  at the beginning there's a test

ns * nx > 1000

When ns and nx are both large, the product overflows and becomes 
negative.  I'll see if I can fix it.

Duncan Murdoch


> 
> for (n in 58950:58970) {
>   cat("n=", n, "\n", sep="");
> 
>   # Clean up first
>   rm(names, x, y); gc();
> 
>   # Create a named vector of length n
>   # Try with format "%5d" and it works
>   names <- sprintf("%05d", 1:n);
>   x <- seq(along=names);
>   names(x) <- names;
> 
>   # Extract the first k elements
>   k <- 36422;
>   t0 <- system.time({
>     y <- x[names[1:k]];
>   })
>   str(y);
> 
>   # But with one more it takes
>   # for ever when n >= 58960
>   k <- k + 1;
>   t1 <- system.time({
>     y <- x[names[1:k]];
>   })
>   # ...then t1/t0 ~= 300-500 and growing!
>   print(t1/t0);
>   str(y);
> }
> 
> 
> The interesting this is that if you replace
> 
>  y <- x[names[1:k]];
> 
> with
> 
>  idxs <- match(names[1:k], names(x));
>  y <- x[idxs];
> 
> everything is fine.
> 
> (For those working with the Affy 100K SNP chips, the freaky thing is
> that the problem occurs at n = 58960 which is exactly the number of
> SNPs on the Xba array; that's how I found out about the bug/feature it
> the first place).
> 
> Tried this on two different systems:
> 
>> sessionInfo()
> R version 2.4.0 (2006-10-03)
> i386-pc-mingw32
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> attached base packages:
> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
> [7] "base"
> 
>> sessionInfo()
> R version 2.4.0 (2006-10-03)
> x86_64-unknown-linux-gnu
> locale:
> C
> attached base packages:
> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
> [7] "base"
> 
> Cheers
> 
> /Henrik
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel




More information about the R-devel mailing list