[Rd] speeding up [.data.frame

Warnes, Gregory R gregory_r_warnes@groton.pfizer.com
Sat, 5 Jan 2002 23:08:58 -0500


(I'm up too late so this might come through garbled...)

I've just been doing some bootstrapping on data frames and I discovered that
S-plus 6.0r1 was a *lot* faster than R 1.3.1 at the task.  Splus was
completing 100 bootstrap iterations in about 4 seconds while R was taking
about 15 seconds. However, doing bootstrapping on equivalent *matrices* R
was slightly faster, 1.5 seconds verses 1.86.

Now, since I'm doing glm's inside the bootstrap, I really need to use data
frames...

It turns out that one of the reasons S-plus is faster on data frames is that
S-Plus's allows you to turn of checking for/resolution of duplicate row
names in "[.data.frame" by setting an attribute 'dup.row.names' to any
non-NULL value.  Adding an additional argument to R's "[.data.frame"  (patch
below) to permit the same optimization and using the argument in my
bootstrap function reduced the elapsed time for R to 8.6 seconds.   

Still, I'm wondering if there are other 'reasonable' changes to
"[.data.frame" that could narrow the gap further...

-Greg

################ PATCH STARTS HERE #################3
diff -c /Volumes/app/R/src/R-1.4.0/src/library/base/R/dataframe.R.orig
/Volumes/app/R/src/R-1.4.0/src/library/base/R/dataframe.R
*** /Volumes/app/R/src/R-1.4.0/src/library/base/R/dataframe.R.orig	Sat
Jan  5 22:58:10 2002
--- /Volumes/app/R/src/R-1.4.0/src/library/base/R/dataframe.R	Sat Jan  5
22:58:10 2002
***************
*** 323,329 ****
  ###  These are a little less general than S
  
  "[.data.frame" <-
!     function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1)
  {
      if(nargs() < 3) {
  	if(missing(i))
--- 323,330 ----
  ###  These are a little less general than S
  
  "[.data.frame" <-
!     function(x, i, j, drop = if(missing(i)) TRUE else length(cols) == 1,
!              dup.row.names=F)
  {
      if(nargs() < 3) {
  	if(missing(i))
***************
*** 390,396 ****
      }
      if(!drop) {
  	names(x) <- cols
! 	if(any(duplicated(rows)))
  	    rows <- make.names(rows, unique = TRUE)
  	attr(x, "row.names") <- rows
  	class(x) <- cl
--- 391,397 ----
      }
      if(!drop) {
  	names(x) <- cols
!         if (any(duplicated(rows)) && !dup.row.names) 
  	    rows <- make.names(rows, unique = TRUE)
  	attr(x, "row.names") <- rows
  	class(x) <- cl




LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._