[R] efficiency and "forcing" questions

david.beede@mail.doc.gov david.beede at mail.doc.gov
Thu Mar 29 18:16:55 CEST 2001


Thank you for the clarification about wintop, Prof. Ripley.

Now that I have run my program on the full data set with 67,000
observations and 382 groups (of which 229 were skipped because they had 40
or fewer obs), I wanted to pose again my questions about the efficiency of
my program.  According to my task monitor software, it took 10 hours of CPU
time to run on my Win 98 machine with 256 mb of RAM (R v. 1.2.2; EMACS
v20.7, ESS v. 5.1.18).  At least for the first two hours of operation it
ran without using the swapfile, although at some point afterwards it did
start using it, according to the task monitor.

Interestingly, the 10 hours of CPU time was split as follows:  6 hours for
EMACS.EXE and 4 hours for RTERM.EXE.  Does this necessarily mean that if I
source()'d my program directly into RTERM that I could save a lot of time?
(I just want to note here that I have found the EMACS/ESS combinations
*extremely* helpful for developing my code; it would be nice if it were
indeed the case that after development one could switch over to solo R to
do the big jobs.)

Also -- my theory that applying gc() multiple times would free up memory
did not seem to pan out.  I apologize about my rash speculation.

Thanks in advance.

David N. Beede
Economist
Office of Policy Development
Economics and Statistics Administration
U.S. Department of Commerce
Room 4858 HCHB
14th Street and Pennsylvania Avenue, N.W.
Washington, DC  20230
Voice:  202.482.1226
Fax:    202.482.0325
e-mail:  david.beede at mail.doc.gov


The program below does the following tasks:

1.  It creates a file (wintemp4) that is a subset of alldata4 consisting of
"winner" records;

2.  It defines a function (myppr1) that runs the ppr function in modreg
once to generate goodness of fit (sum of squared errors) measures by number
of terms included in model and then reruns ppr using the number of terms
with the lowest sum of squared errors.

3.  It grinds through a loop, subsetting wintemp4 by group and running
myppr1 for each
group subset; and

4.  It puts the ppr output into a separate vector element for each group
(in an attempt to avoid "growing" the vector).

#Here is the program
for(i in 1:4) gc()
load("alldata4.Rdata")
assign("wintemp4", subset(alldata4, winner==1))
rm(alldata4)
for(i in 1:4) gc()
library(modreg)
attach(wintemp4)

myppr1 <- function(x)
{
#run pprfile once to get list of sum of squared errors corresponding to differen numbers of terms
      pprfile.ppr <- ppr(
               award~
               ilogemp+ilogage+sdb+allsmall+
               size2+size3+size4+size5+size6+size7+size8+size9+size10+
               X.Iprimnaic.2+X.Iprimnaic.3+X.Iprimnaic.4+X.Iprimnaic.5+X.Iprimnaic.6+
               X.Iprimnaic.7+X.Iprimnaic.8+X.Iprimnaic.9+X.Iprimnaic.10+X.Iprimnaic.11+
               X.Iprimnaic.12+X.Iprimnaic.13+X.Iprimnaic.14+X.Iprimnaic.15+X.Iprimnaic.16+
               X.Iprimnaic.17+X.Iprimnaic.18+X.Iprimnaic.19+X.Iprimnaic.20+X.Iprimnaic.21+
               X.Iprimnaic.22+X.Iprimnaic.23+X.Iprimnaic.24+X.Iprimnaic.25+X.Iprimnaic.26,
               data=x, nterms=1, max.terms= min(nrow(x),40), optlevel=3
                        )
#pick number of terms giving best fit
         numterm <- which.min(pprfile.ppr$gofn[pprfile.ppr$gofn>0])
         pprfile.ppr <- ppr(
               award~
               ilogemp+ilogage+sdb+allsmall+
               size2+size3+size4+size5+size6+size7+size8+size9+size10+
               X.Iprimnaic.2+X.Iprimnaic.3+X.Iprimnaic.4+X.Iprimnaic.5+X.Iprimnaic.6+
               X.Iprimnaic.7+X.Iprimnaic.8+X.Iprimnaic.9+X.Iprimnaic.10+X.Iprimnaic.11+
               X.Iprimnaic.12+X.Iprimnaic.13+X.Iprimnaic.14+X.Iprimnaic.15+X.Iprimnaic.16+
               X.Iprimnaic.17+X.Iprimnaic.18+X.Iprimnaic.19+X.Iprimnaic.20+X.Iprimnaic.21+
               X.Iprimnaic.22+X.Iprimnaic.23+X.Iprimnaic.24+X.Iprimnaic.25+X.Iprimnaic.26,
               data=x, nterms=numterm, max.terms= min(nrow(x),40), optlevel=3
                            )
      cat("group =", x$group[1],"\n")
      cat("NAIC =", x$naic4[1],"\n")
      cat("cendiv =", as.character(x$cendiv[1]),"\n")
      cat("number of obs used =", nrow(x),"\n")
      print(summary(pprfile.ppr))
}

grouparr <- levels(as.factor(wintemp4$group))
pprest <- vector(mode="list",length=length(grouparr))

for(i in seq(along=grouparr))
  {
    subi <- subset(wintemp4,wintemp4$group==grouparr[i])
    if(nrow(subi) > 40) pprest[i][[1]] <- myppr1(subi)
    rm(subi)
    print(gc())
  }

detach(wintemp4)





Prof Brian D Ripley <ripley at stats.ox.ac.uk>@auk.stats> on 03/29/2001
12:04:13 AM

Sent by:  <ripley at auk.stats>


To:   <david.beede at mail.doc.gov>
cc:   <r-help at stat.math.ethz.ch>

Subject:  Re: [R] efficiency and "forcing" questions


On Wed, 28 Mar 2001 david.beede at mail.doc.gov wrote:

>
> Dear R listers --
> Thank you for the suggestions from Tony and Prof. Ripley about wintop.
My
> understanding is that wintop will monitor CPU and memory usage by
process,
> so one can tell quickly if an R program is still running or not.  This is
> very useful!
>
> You should note however that the MS web page claims that wintop and the
> other PowerTools (or PowerKernel) applications should not be installed on
> Win 98 machines.  However, a search through Yahoo found users advising
> people to disregard the disclaimer on the MS website, although some of
the
> other PowerTools did cause problems in Win98.

Well, I have used it many times on Win98 machines, both 98 (4.10) and 98SE
(4.10a)  so I would just ignore the warning.


--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list