[R] standardGeneric seems slow; any way to get around it?

Martin Morgan mtmorgan at fhcrc.org
Wed Nov 11 04:47:46 CET 2009


John Tillinghast wrote:
> Hi,
> I'm running some routines with standard matrix operations like solve() and
> diag().
> When I do a profile, the lead item under total time is standardGeneric().
> Furthermore, solve() and diag() have much greater total time than self time.
> ???

Hi John -- are you interpreting the output of summaryRprof() correctly?
Here's an example

> f = function() { g(); h() }
> g = function() { t0 = Sys.time(); while (Sys.time() - t0 < 5) {} }
> h = function() { t0 = Sys.time(); while (Sys.time() - t0 < 2) {} }

We'd expect all the time to be spent in 'g' and 'h'; 'f' is just a
wrapper. So...

> Rprof(); f(); Rprof(NULL); head(summaryRprof()$by.total, 8)
            total.time total.pct self.time self.pct
"f"               6.98     100.0      0.00      0.0
"g"               4.98      71.3      0.06      0.9
"-"               4.36      62.5      0.18      2.6
"-.POSIXt"        4.24      60.7      0.06      0.9
"difftime"        4.00      57.3      0.22      3.2
"structure"       2.74      39.3      1.16     16.6
"match.arg"       2.04      29.2      0.20      2.9
"h"               2.00      28.7      0.02      0.3

This says 100% of the total time (about 7s) is spent in 'f' or functions
called by f, and 0% (well, I guess that's an estimate) is actually spent
in 'f' itself. Likewise, the total time in g or the functions it calls
is about 5s, and h or the functions it calls about 2s. In all of these
cases, little time is spent exactly _in_ the function, so self.time is
small.

Note also -, -.POSIX, difftime, structure etc; these functions are
common to both g and h, and the time spent in them reflects the
contributions from both g and h.

So the small self.time of solve and diag just say that the code is
moving quickly through these, and the large total time for
standardGeneric likely means that it's at the top of your call stack, or
that it's called from several different places in your code.

Here's another example, after library(Matrix); example(solve)

> Rprof(); for (i in 1:1000) solve(h8); Rprof(NULL);
head(summaryRprof()$by.tot)
                  total.time total.pct self.time self.pct
"solve"                 0.22     100.0      0.04     18.2
"standardGeneric"       0.22     100.0      0.04     18.2
"solve.default"         0.14      63.6      0.00      0.0
"diag"                  0.04      18.2      0.02      9.1
"*"                     0.02       9.1      0.02      9.1
".Call"                 0.02       9.1      0.02      9.1

'solve' and 'standardGeneric' are at the top of the call stack, so get
~100% of the total time. In this particular case they also account for a
substantial portion of the self.time, and it (might) be worth asking
whether there's a different algorithm that would reduce the number of
calls to solve (e.g., by not iterating over 1:1000 ;).

If you were being aggressive and really were calling solve repeatedly on
the same type of object, you might

> Rprof(); f = selectMethod(solve, class(h8)); for (i in 1:1000) f(h8);
+ Rprof(NULL); head(summaryRprof()$by.total)
                total.time total.pct self.time self.pct
"f"                   0.18     100.0      0.04     22.2
"solve.default"       0.14      77.8      0.02     11.1
".Call"               0.04      22.2      0.04     22.2
"nrow"                0.04      22.2      0.04     22.2
"diag"                0.04      22.2      0.00      0.0
"array"               0.02      11.1      0.02     11.1


Hope that sheds some light on your problem.

Martin


> I assume there is some time-consuming decision going on in the usual
> functions;
> is there any way to avoid that and go straight to the calculaions?
> 
> Thanks
> John Tillinghast
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793




More information about the R-help mailing list