[R] in continuation with the earlier R puzzle

Mon Jul 12 20:40:11 CEST 2010

I wanted to point out one thing that Ted said, about initializing the
vectors ('s' in your example).  This can make a dramatic speed
difference if you are using a for loop (the difference is neglible
with vectorized computations).

Also, a lot of benchmarks have been flying around, each from a
different system and using random numbers without identical seeds.  So
to provide an overall comparison of all the methods I saw here plus
demonstrate the speed difference for initializing a vector (if you
know its desired length in advance), I ran these benchmarks.

Notes:
I did not want to interfere with your objects so I used different
names. The equivalencies are: news1o = x; s2o = y; s = z.
system.time() automatically calculates the time difference from
proc.time() between start and finish .

> ##R version info
> sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-pc-mingw32
#snipped
>
> ##Some Sample Data
> set.seed(10)
> x <- rnorm(10^6)
> set.seed(15)
> y <- rnorm(10^6)
>
> ##Benchmark 1
> z.1 <- NULL
> system.time(for(i in 1:length(x)) {
+   if(x[i] > y[i]) {
+     z.1[i] <- 1
+   } else {
+     z.1[i] <- -1}
+ }
+             )
   user  system elapsed
1303.83  174.24 1483.74
>
> ##Benchmark 2
> #initialize 'z' at length
> z.2 <- vector("numeric", length = 10^6)
> system.time(for(i in 1:length(x)) {
+   if(x[i] > y[i]) {
+     z.2[i] <- 1
+   } else {
+     z.2[i] <- -1}
+ }
+             )
   user  system elapsed
   3.77    0.00    3.77
>
> ##Benchmark 3
>
> z.3 <- NULL
> system.time(z.3 <- ifelse(x > y, 1, -1))
   user  system elapsed
   0.38    0.00    0.38
>
> ##Benchmark 4
>
> z.4 <- vector("numeric", length = 10^6)
> system.time(z.4 <- ifelse(x > y, 1, -1))
   user  system elapsed
   0.31    0.00    0.31
>
> ##Benchmark 5
>
> system.time(z.5 <- 2*(x > y) - 1)
   user  system elapsed
   0.01    0.00    0.01
>
> ##Benchmark 6
>
> system.time(z.6 <- numeric(length(x))-1)
   user  system elapsed
      0       0       0
> system.time(z.6[x > y] <- 1)
   user  system elapsed
   0.03    0.00    0.03
>
> ##Show that all results are identical
>
> identical(z.1, z.2)
[1] TRUE
> identical(z.1, z.3)
[1] TRUE
> identical(z.1, z.4)
[1] TRUE
> identical(z.1, z.5)
[1] TRUE
> identical(z.1, z.6)
[1] TRUE

I have not replicated these on other system, but tentatively, it
appears that loops are significantly slower than ifelse(), which in
turn is slower than options 5 and 6.  However, when using the same
test data  and the same system, I did not find an appreciable
difference between options 5 and 6 speed wise.

Cheers,

Josh

On Mon, Jul 12, 2010 at 7:09 AM, Raghu <r.raghuraman at gmail.com> wrote:
> When I just run a for loop it works. But if I am going to run a for loop
> every time for large vectors I might as well use C or any other language.
> The reason R is powerful is becasue it can handle large vectors without each
> element being manipulated? Please let me know where I am wrong.
>
> for(i in 1:length(news1o)){
> + if(news1o[i]>s2o[i])
> + s[i]<-1
> + else
> + s[i]<--1
> + }
>
> --
> 'Raghu'
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/