[Rd] profiler and loops

Romain Francois romain.francois at dbmail.com
Tue Mar 3 13:33:14 CET 2009


Hello,

(This is follow up from this thread: 
http://www.nabble.com/execution-time-of-.packages-td22304833.html but 
with a different focus)

I am often confused by the result of the profiler, when a loop is 
involved. Consider these two scripts:

script1:

Rprof( )
x <- numeric( )
   for( i in 1:10000){
     x <- c( x, rnorm(10) )
   }
Rprof( NULL )
print( summaryRprof( ) )


script2:

Rprof( )
ffor <- function(){
   x <- numeric( )
   for( i in 1:10000){
     x <- c( x, rnorm(10) )
   }
}
ffor()
Rprof( NULL )
print( summaryRprof( ) )


[]$ time Rscript --vanilla script1.R
$by.self
       self.time self.pct total.time total.pct
"rnorm"      0.22      100       0.22       100

$by.total
       total.time total.pct self.time self.pct
"rnorm"       0.22       100      0.22      100

$sampling.time
[1] 0.22

real    0m7.786s
user    0m5.192s
sys     0m0.735s

[]$$ time Rscript --vanilla script2.R
$by.self
       self.time self.pct total.time total.pct
"ffor"       4.94     92.5       5.34     100.0
"rnorm"      0.40      7.5       0.40       7.5

$by.total
       total.time total.pct self.time self.pct
"ffor"        5.34     100.0      4.94     92.5
"rnorm"       0.40       7.5      0.40      7.5

$sampling.time
[1] 5.34


real    0m7.841s
user    0m5.152s
sys     0m0.712s



In the first one, I call a for loop from the top level and in the second 
one, the loop is wrapped in a function call. This shows the inability of 
the profiler to point loops as responsible for bottlenecks. The coder of 
script1 would not know what to do to improve on the script.

I have had a quick look in the code, and here are a few thoughts:

in the function "doprof" in eval.c,  this loop write the call stack on 
the profiler file:

for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
   if ((cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN))
       && TYPEOF(cptr->call) == LANGSXP) {
       SEXP fun = CAR(cptr->call);
       if (!newline) newline = 1;
       fprintf(R_ProfileOutfile, "\"%s\" ",
           TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
           "<Anonymous>");
   }
   }
  so we can see it only cares about context CTXT_FUNCTION and 
CTXT_BUILTIN, when for loops play with CTXT_LOOP (this is again in 
eval.c within the do_for function)

begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
        R_NilValue);
 
which as the name implies, begins the context of the for loop. The 
begincontext function looks like this :

void begincontext(RCNTXT * cptr, int flags,
         SEXP syscall, SEXP env, SEXP sysp,
         SEXP promargs, SEXP callfun)
{
   cptr->nextcontext = R_GlobalContext;
   cptr->cstacktop = R_PPStackTop;
   cptr->evaldepth = R_EvalDepth;
   cptr->callflag = flags;
   cptr->call = syscall;
   cptr->cloenv = env;
   cptr->sysparent = sysp;
   cptr->conexit = R_NilValue;
   cptr->cend = NULL;
   cptr->promargs = promargs;
   cptr->callfun = callfun;
   cptr->vmax = vmaxget();
   cptr->intsusp = R_interrupts_suspended;
   cptr->handlerstack = R_HandlerStack;
   cptr->restartstack = R_RestartStack;
   cptr->prstack = R_PendingPromises;
#ifdef BYTECODE
   cptr->nodestack = R_BCNodeStackTop;
# ifdef BC_INT_STACK
   cptr->intstack = R_BCIntStackTop;
# endif
#endif
   R_GlobalContext = cptr;
}


So it could be possible to set the last argument of the begincontext 
function to "for" and use this code in the doprof function:


for (cptr = R_GlobalContext; cptr; cptr = cptr->nextcontext) {
 if ( ( cptr->callflag & (CTXT_FUNCTION | CTXT_BUILTIN ) )
       && TYPEOF(cptr->call) == LANGSXP) {
       SEXP fun = CAR(cptr->call);
       if (!newline) newline = 1;
       fprintf(R_ProfileOutfile, "\"%s\" ",
           TYPEOF(fun) == SYMSXP ? CHAR(PRINTNAME(fun)) :
           "<Anonymous>");
   } else if( cptr->callflag & CTXT_LOOP){
     SEXP fun = CAR(cptr->syscall);
     if (!newline) newline = 1;
     fprintf(R_ProfileOutfile, "\"%s\" ", CHAR(PRINTNAME(fun)) );
   }
}

so that we see for in the list of "functions" that appear in the 
profiler file.

Obviously I am taking some shortcuts here, because of the other loops, 
but I would like to make a formal patch with this. Before I do that, I'd 
like to know :
- is this has a chance of breaking something else (does the CTXT_LOOP 
being R_NilValue is used elsewhere)
- would this feature be welcome.
- Should I differentiate real functions with loops in the output file, 
maybe I can write "[for]" instead of for to emphacize this is not a 
function.

Romain

-- 
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr


-- 
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr



More information about the R-devel mailing list