[R] fastest R platform: follow-up and summary

Huntsinger, Reid reid_huntsinger at merck.com
Tue Apr 17 16:11:13 CEST 2001


The following runs in an eyeblink on my 700Mhz Thinkpad T-20 (256 MB RAM)
with Windows NT:

var(matrix(rnorm(4000000),ncol=4,nrow=1000000))

This also has the virtue of being quite readable. You could allow an
arbitrary covariance matrix and mean vector and it increases the time
slightly, but still only about 5 seconds.

Regarding performance, having tons of RAM is crucial. My Windows NT and the
few Office applications that are always running claim at least 100MB RAM
according to the performanc monitor; that doesn't leave much for the large
objects you tend to create when you vectorize unless you have at least
256MB.

Reid Huntsinger

-----Original Message-----
From: Jason Liao [mailto:jg_liao at yahoo.com]
Sent: Monday, April 16, 2001 11:14 PM
To: r-help at stat.math.ethz.ch; jg_liao at yahoo.com
Cc: Jason Liao
Subject: [R] fastest R platform: follow-up and summary


First, I would like to thank everyone who responded to my post and I
apol*o*gize for replying late.

1. How slow can R be?

I compare a simple simulation experiment in both R and OX. It is about
sampling one million draws from a 4 dimensional normal distribution and
estimate the variance of the normal distribution. Both R and OX
programs are at the end of this post. For this problem, OX is 168 times
as fast as R. Hard to believe!

OX: took 8"

Ox version 2.20 (Windows) (C) J.A. Doornik, 1994-2000
This version may be used for academic research and teac
22:31:07
      0.99700  -0.00069643  -0.00077984   0.00091080
  -0.00069643       1.0020   0.00048600   0.00079022
  -0.00077984   0.00048600      0.99935  -0.00064050
   0.00091080   0.00079022  -0.00064050       1.0003
22:31:15

R-result: took 22'24".

[1] "Mon Apr 16 22:00:25 2001"
              [,1]          [,2]          [,3]          [,4]
[1,]  9.971903e-01  0.0016107735 -3.486704e-05 -0.0007814097
[2,]  1.610774e-03  1.0024532860  3.845797e-04 -0.0012628987
[3,] -3.486704e-05  0.0003845797  1.000458e+00 -0.0014200821
[4,] -7.814097e-04 -0.0012628987 -1.420082e-03  1.0014095508
[1] "Mon Apr 16 22:22:49 2001"

2. I spent one day of time aggressively vectorizing the program. It
takes 1/3 of the time as before. Much more bearable. This, however,
makes the program less readable.

3. R on different platforms:

Thomas Lumley said: On platforms: R performs at similar speed between
Windows and Linux, and my limited comparisons between Intel and SPARC
machines suggest that the SPECint rating will give a reasonable
ballpark estimate of speed across platforms.  This means you probably
don't want to shift off Intel for speed reasons unless you have a LOT
of money.

Prof. Ripley said: R runs its tests and my scripts between 5% and 20%
slower on Windows NT/2000 than on linux RH6.2, usually nearer 5%.  

M. Edward (Ed) Borasky also mentioned Win NT and Linux but did not give
comparison or recommendation.

Andy Perrin mentioned that a perl program run 3.5 faster on Linux than
on Windows. But I do not see reason for this to hold for R.

4. profiling of the program.

This was suggested by Profs. Ripley, Bates, Lumley and Andy Liaw. But
according to Prof. Ripley, it can only be done on unix system. For my
program I knew the bottleneck. It was just hard to break it.

5 tricks for coding

M. Edward (Ed) Borasky mentioned some low-level tricks for coding.
Thanks. I have used some in rewriting the program.

6. Style

Patrick Connolly asked why I put ; after each statement. Well, I have
to do SAS and Java also. This comma is one small thing I can use in all
the three software.

To summarize: the forthcoming P4 1.70 MHz will be my next machine for
runing R. Thanks again.

Jason Liao


ox program
   #include <oxstd.h>
   #include <oxprob.h>
   #import <maximize>

   multi_normal(const mu, const var_root)
   {		 
      decl ran;
      ran = rann(4, 1);  
	  ran = var_root*ran;  
	  ran = ran+mu;	 return ran;
	}  


	main()
	{
	   decl N=1000000;
	   decl sum = zeros(4, 4);
	   decl i, ran;

	   print(time());
	   decl mu = zeros(4,1);
	   decl var_root = unit(4);	  
	   
	   for(i=0; i<N; i++)
	   {		  
	      ran = multi_normal(mu, var_root);
		  sum = sum + ran*ran';	  
		  
		  }
		  

		  sum = sum/N;
		  print(sum);
		  print(time());
}


R-program
		 ################
		  rm(list=ls(all=TRUE));
  rmulti.norm <- function(mean, var.root)
  {    
     ran <- rnorm(4);    
     ran <- mean + as.vector( var.root %*% ran );         
  }
    main <- function()
	{
	   N <- 1000000;
	   sum <- numeric(4);
	 
	   print(date());
	   mu <- numeric(4);
	   var.root <- matrix(0, 4, 4);
	   diag(var.root) <- rep(1, 4);	
	   
	   for(i in 1:N)
	   {		  		
	      ran <- rmulti.norm(mu, var.root);		  
		  sum <- sum + outer(ran, ran);	  
		  
		  }

	

		  sum <- sum/N;
		  print(sum);
		  print(date());
}

	  main();








=====
Jason G. Liao
Department of Biometry and Epidemiology
Medical University of South Carolina
135 Rutledge Ave., STE 1148, Charleston, SC 29425
phone (843) 876-1114, fax (843) 876-1126

http://www.geocities.com/jg_liao/index.html

__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list