[R] Vectorizing and R speed, follow-up

Jason Liao jg_liao at yahoo.com
Thu Aug 8 18:55:09 CEST 2002


Thanks to Profs. Ripley, Lumley and Camann for replying to my
"complaint" about R speed. I have a few follow-up points:

1. They all pointed to the need to vectorizing the code for efficiency.
As Michael just said, however, this can require a substantial paradigms
shift and rethinking and may probe difficult to those of us who have
extensive experience in other languages and who may even consider
themselves good at programming. More importantly, some code can not be
vectorized. In my "real" program, I need to generate a large number of
truncated normal devaites.This has to done in a sequential manner since
the the mean and variance of the later ones depend on the values of the
previous ones.

2. I did another comparison between R and Java. This time is to compute
the covriance matrix of multinomial distributions. The R function is
simple and fully vectorized:
 
var.multinomial = function(n, pi)
{
   n * ( -pi%*%t(pi) + diag(pi*(1-pi)+pi*pi) )
}

For n=20, the time for one operation is 57/100000 seconds based on
  system.time(for (i in 1:100000) var = var.multinomial(20,
rep(1/20,20)) )

Hand coding this into Java and one opertaion takes 4/100000 seconds,
still 15 times faster.

3. I use Java for comparison because Java is also interpreted. Java
started also very slow but has improved tremendously with introduction
of just-in-time compiler and Hot spot. I am hoping that R will catch up
so that we statisticians will no longer need to struggle with C,
Fortran or Java.

Appendix:
My java code for variance of multinomial distribution:

 static double[][] var_multinomial(int n, double[] pi)
       {
           int p = pi.length;
           double[][] var = new double[p][p];
           for(int i=0; i<p; i++)
               for(int j=0; j<p; j++)
               {
                   if(i==j) var[i][j] = n*pi[i]*(1-pi[i]);
                   else var[i][j] = - n*pi[i]*pi[j];
               }
           return var;
       }












 static double[][] var_multinomial(int n, double[] pi)
       {
           int p = pi.length;
           double[][] var = new double[p][p];
           for(int i=0; i<p; i++)
               for(int j=0; j<p; j++)
               {
                   if(i==j) var[i][j] = n*pi[i]*(1-pi[i]);
                   else var[i][j] = - n*pi[i]*pi[j];
               }
           return var;
       }



=====
Jason G. Liao, Ph.D.
Division of Biometrics
University of Medicine and Dentistry of New Jersey
335 George Street, Suite 2200
New Brunswick, NJ 08903-2688
phone (732) 235-8611, fax (732) 235-9777
http://www.geocities.com/jg_liao

__________________________________________________

HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list