[Rd] Package compiler - efficiency problem

Tomas Kalibera tom@@@k@liber@ @ending from gm@il@com
Mon Aug 13 18:02:23 CEST 2018


Dear Karol,

thank you for the report. I can reproduce that the function from you 
example takes very long to compile and I can see where most time is 
spent. The compiler is itself written in R and requires a lot of 
resources for large functions (foo() has over 16,000 lines of code, 
nearly 1 million of instructions/operands, 45,000 constants). In 
particular a lot of time is spent in garbage collection and in finding a 
unique set of constants. Some optimizations of the compiler may be 
possible, but it is unlikely that functions this large will compile fast 
any soon. For non-generated code, we now have the byte-compilation on 
installation by default which at least removes the compile overhead from 
runtime. Even though the compiler is slow, it is important to keep in 
mind that in principle, with any compiler there will be functions where 
compilation would not be improve performance (when the compile time is 
included or not).

I think it is not a good idea to generate code for functions like foo() 
in R (or any interpreted language). You say that R's byte-code compiler 
produces code that runs 5-10x faster than when the function is 
interpreted by the AST interpreter (uncompiled), which sounds like a 
good result, but I believe that avoiding code generation would be much 
faster than that, apart from drastically reducing code size and 
therefore compile time. The generator of these functions has much more 
information than the compiler - it could be turned into an interpreter 
of these functions and compute their values on the fly.

A significant source of inefficiency of the generated code are 
element-wise operations, such as

r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
...

r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])

(these could be vectorized, which would reduce code size and improve 
interpretation speed; and make it somewhat readable). Most of the code 
lines in the generated functions seem to be easily vectorizable.

Compilers and interpreters necessarily use some heuristics or optimize 
at some code patterns. Optimizing for generated code may be tricky as it 
could even harm performance of usual code. And, I would much rather 
optimize the compiler for the usual code.

Indeed, a pragmatic solution requiring the least amount of work would be 
to disable compilation of these generated functions. There is not a 
documented way to do that and maybe we could add it (and technically it 
is trivial), but I have been reluctant so far - in some cases, 
compilation even of these functions may be beneficial - if the speedup 
is 5-10x and we run very many times. But once the generated code 
included some pragma preventing compilation, it won't be ever compiled. 
Also, the trade-offs may change as the compiler evolves, perhaps not in 
this case, but in other where such pragma may be used.

Well so the short answer would be that these functions should not be 
generated in the first place. If it were too much work rewriting, 
perhaps the generator could just be improved to produce vectorized 
operations.

Best
Tomas

On 12.8.2018 21:31, Karol Podemski wrote:
>   Dear R team,
>
> I am a co-author and maintainer of one of R packages distributed by R-forge
> (gEcon). One of gEcon package users found a strange behaviour of package (R
> froze for couple of minutes) and reported it to me. I traced the strange
> behaviour to compiler package. I attach short demonstration of the problem
> to this mail (demonstration makes use of compiler and tictoc packages only).
>
> In short, the compiler package has problems in compiling large functions -
> their compilation and execution may take much longer than direct execution
> of an uncompiled function. Such functions are generated by gEcon package as
> they describe steady state for economy.
>
> I am curious if you are aware of such problems and plan to handle the
> efficiency issues. On one of the boards I saw that there were efficiency
> issues in rpart package but they have been resolved. Or would you advise to
> turn off JIT on package load (package heavily uses such long functions
> generated whenever a new model is created)?
>
> Best regards,
> Karol Podemski
>
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


	[[alternative HTML version deleted]]



More information about the R-devel mailing list