[R] Puzzled by what Rprof is telling me

Don MacQueen macq at llnl.gov
Tue Jun 11 19:00:37 CEST 2002


I am using Rprof() to help find ways to improve performance.

I found a function whose total seconds and self seconds were large. I 
replaced it with something else. The something else had a small 
number of total seconds and self seconds. But the total time did not 
decrease.

I don't understand how that could be, and would appreciate any suggestions.

Thanks
-Don


Details, unfortunately lengthy, are here:

The snippet of code I'm profiling is given below. 'inmat' is a 
numeric matrix with 128347 rows; I need to apply a function, sfun(), 
to each of 4191 subsets of rows, indexed by indat$grpi. sfun() is 
fairly complex, and requires information from all of the columns of 
inmat, so simple uses of, for example, aggregate() don't do the job.

Rprof('Rpf3n')
for (i in unique(indat$grpi)) {
  ssnew[i,] <- sfun(inmat[indat$grpi==i,,drop=FALSE])
}
Rprof(NULL)


----------------- Rprof output, v1 ----------------
Each sample represents 0.02 seconds.
Total run time: 404.999999999922 seconds.

Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.

    %       total       %       self
  total    seconds     self    seconds    name
100.00    405.00      0.00      0.00     "source"
100.00    405.00      0.11      0.46     "eval.with.vis"
  99.88    404.50      4.70     19.04     "sfun"
  83.98    340.10     41.25    167.06     "nrow"
  42.68    172.84     42.68    172.84     "=="
   3.08     12.48      0.50      2.04     "lm.fit"
   1.72      6.96      0.23      0.94     "median"
(remainder omitted)

nrow() has 340 total and 41 self seconds, a large portion of the 
total run time.

There is only one call to nrow() in my code,
   ntot <- nrow(dati)
where dati is a subset of the numeric matrix, indat.

I tried replacing nrow(dati) with length(dati[,1])

The profile is then:

----------------- Rprof output, v2 ----------------
Each sample represents 0.02 seconds.
Total run time: 403.699999999923 seconds.

Total seconds: time spent in function and callees.
Self seconds: time spent in function alone.

    %       total       %       self
  total    seconds     self    seconds    name
100.00    403.70      0.11      0.46     "eval.with.vis"
100.00    403.70      0.00      0.00     "source"
  99.88    403.20     46.03    185.82     "sfun"
  42.37    171.06     42.37    171.06     "=="
   3.00     12.10      0.59      2.40     "lm.fit"
   1.70      6.88      0.22      0.90     "median"
(skip many lines until)
   0.01      0.04      0.01      0.04     "length"
(remainder omitted)

This is where I'm puzzled.
nrow() has disappeared as expected
length() has not taken its place
Yet the total run time is about the same.


As an aside, I had previously been using sapply() and split(), and 
that was far slower. Rprof() helped me find out why it was slow. 
Also, the matrix ssnew is created before the loop, with the right 
dimensions, so it's not being grown during the loop.

-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA
--------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list