[R] case study of efficient R coding

Tue Oct 15 22:35:43 CEST 2002

This is an interesting case study. Method 3 and 4 only differ in how the
indexes are obtained. Method 3 uses seq, which uses the primitive function
c. Method 4 evaluates p^2 logical expression in R. When p is small, method
4 can beat method 3. But as p gets bigger, the time usage by method 3 only
slightly increases, while the time usage by method 4 increasely
dramatically.

test <- function(nsim, p) {
  z1<- system.time(for (i in 1:nsim) seq(1, by=(p+1), length=p))
  mat <- matrix(1, p, p)
  z2 <- system.time(for (i in 1:nsim) row(mat) == col(mat))
  rbind(seq=z1, rowcol=z2)
}

> test(1000, 200)
       [,1] [,2] [,3] [,4] [,5]
seq    0.21    0 0.21    0    0
rowcol 9.16    0 9.16    0    0
> test(1000, 100)
       [,1] [,2] [,3] [,4] [,5]
seq    0.17 0.01 0.19    0    0
rowcol 2.06 0.00 2.05    0    0
> test(1000, 10)
       [,1] [,2] [,3] [,4] [,5]
seq    0.16 0.00 0.16    0    0
rowcol 0.04 0.01 0.05    0    0

On Tue, 15 Oct 2002, Jason Liao wrote:

> The following are four ways I coded for the variance of multinomial
> distribution and their CPU time. It is a bit surprising that the 3rd
> method beats the 4th one substantially.
> 
> 
>  var.multinomial1 = function(n, pi)
> +   {
> +      p = length(pi);
> +      var = array(0, c(p,p));
> +      
> +      for(i in 1:p)
> +        for(j in 1:p)
> +        {
> +           if(i==j) var[i,j] = n*pi[i]*(1-pi[1])
> +   else var[i,j] = - n*pi[i]*pi[j];
> +        }
> +    }
> > 
> >    var.multinomial2 = function(n, pi)
> +    {
> +      n*( -pi%*%t(pi) + diag(pi) )
> +    }
> >    
> >            var.multinomial3 = function(n, pi)
> +            {
> +               var = -pi%*%t(pi);
> +               var[row(var)==col(var)] = pi*(1-pi); 
> +               n*var;
> +            }
> >    
> >    var.multinomial4 = function(n, pi)
> +            {
> +               var = -pi%*%t(pi);
> +               var[seq(1, by=(p+1), length=p)] = pi*(1-pi); 
> +               n*var;
> +            }
> >    
> >    n = 100;
> >    pi = exp( rnorm(10));
> >    pi = pi/sum(pi);
> >    n.simu = 1000;
> >    
> >    system.time(for(i in 1:n.simu) var = var.multinomial1(n, pi))
> [1]  4.50  0.04 99.73    NA    NA
> >     system.time(for(i in 1:n.simu) var = var.multinomial2(n, pi))
> [1]  0.33  0.00 10.74    NA    NA
> >      system.time(for(i in 1:n.simu) var = var.multinomial3(n, pi))
> [1] 0.14 0.00 4.17   NA   NA
> >       system.time(for(i in 1:n.simu) var = var.multinomial4(n, pi))
> [1] 0.21 0.00 6.47   NA   NA
> >    
> >      
> >    
> But first, does anyone know a vectorized way of coding (summing the
> outter product of rows)
> 
> p = 10;
> x = rnorm(p*p);
> dim(x) = c(p.p);
> 
> var = array(0, c(p,p));
> for(i in 1:p) var = var + x[i,]%*% t(x[i,]);
> 
> =====
> Jason G. Liao, Ph.D.
> Division of Biometrics
> University of Medicine and Dentistry of New Jersey
> 335 George Street, Suite 2200
> New Brunswick, NJ 08903-2688
> phone (732) 235-8611, fax (732) 235-9777
> http://www.geocities.com/jg_liao
> 
> __________________________________________________
> 
> Faith Hill - Exclusive Performances, Videos & More
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._