[R] Simple Lookup... why so slow

Fri Aug 6 16:45:03 CEST 2004

The first 2 solutions are vastly slower than the last 3 simply because
they use the for() loop. The vectorised versions are definitely faster.

# Solution 1 : list extraction operator
aa <- rep(NA, n); bb <- rep(NA, n)

system.time( for (i in 1:n) {
  aa[i] <- PatDay$Day[i] - StartDay[PatDay$Treat[i], PatDay$Pat[i]] } )
[1] 0.33 0.00 0.33 0.00 0.00

# Solution 2 : numeric index with for loop
system.time( for (i in 1:n){ 
   bb[i] <-  PatDay[i,1]-StartDay[PatDay[i,3],PatDay[i,2]] } )
[1] 15.43  0.12 17.76  0.00  0.00

# Solution 3 : Vectorised operation with numeric index
system.time( cc <- PatDay[ , 1] - StartDay[ as.matrix(PatDay[, 3:2]) ] )
[1] 0.01 0.00 0.01 0.00 0.00

# Solution 4 : Vectorised operation with named index
> system.time( dd <- PatDay[ , "Day"] - StartDay[ as.matrix(PatDay[,
c("Treat", "Pat")]) ] )
[1] 0.01 0.00 0.01 0.00 0.00

# Solution 5 : Vectorised operation with list extractor
system.time( ee <- PatDay$Day - StartDay[ cbind(PatDay$Treat,PatDay$Pat)
] )
[1] 0 0 0 0 0

There is insufficient precision to say which of the parameterised
operation is faster. So I tried the same thing with n=400,000 and the
last 3 gave the following timing

Solution 3 : [1] 1.67 0.21 1.89 0.00 0.00
Solution 4 : [1] 2.55 0.21 2.77 0.00 0.00
Solution 5 : [1] 0.25 0.03 0.28 0.00 0.00

However, when I redefined PatDay as matrix, for n=400,000

Solution 3 : [1] 0.48 0.04 0.51 0.00 0.00
Solution 4 : [1] 0.26 0.04 0.31 0.00 0.00

Just to make sure all the answer are the same, try this

cor( cbind(aa, bb, cc, dd) )
   aa bb cc dd
aa  1  1  1  1
bb  1  1  1  1
cc  1  1  1  1
dd  1  1  1  1

or the slow way : all.equal(aa, bb); all.equal(aa, cc); ...

Regards, Adai

On Fri, 2004-08-06 at 13:42, Dieter Menne wrote:
> Dear List,
> 
> At 32 degrees Celsius in the office, I was too lazy to figure out
> the correct xapplytion for a simple lookup problem
> and regressed to well-known c-style. Only to see my
> computer hang forever doing 10000 indexed offset calculation.
> Boiled down, the problem is shown below; needs a few milliseconds
> in c. Looking at the timing results of n=2000 and n=4000,
> this is not linear in time, so something I don't understand
> must go on.
> 
> And, just as an aside: why is $-indexing so much faster (!)
> than numeric indexing?
> 
> Dieter
> 
> (all on Windows, latest R-Version)
> ----
> 
> # Generate Data set
> StartDay = matrix(as.integer(runif(80)*20),nrow=4)
> n=4000
> PatDay = data.frame(Day = as.integer(runif(n)*20)+50,
>                        Pat= as.integer(runif(n)*20)+1,
>                        Treat = as.integer(runif(n)*4)+1,
>                        DayOff=NA) # reserve output space
> # Correct for days offset
> ti= system.time(
>   for (i in 1:n)
>     PatDay$DayOff[i] = PatDay$Day[i]-StartDay[PatDay$Treat[i],PatDay$Pat[i]]
>   )
> cat("$Style index",n,ti[3],"\n");
> # n= 2000 3 seconds
> # n= 4000 15 seconds
> 
> # I first believed using numeric indexes could be faster...
> ti= system.time(
>   for (i in 1:n)
>     PatDay[i,4] = PatDay[i,1]-StartDay[PatDay[i,3],PatDay[i,2]]
>   )
> cat("Numeric index", n,ti[3],"\n");
> # n=2000 12 seconds
> # n=4000 53 seconds
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>