[Rd] Package compiler - efficiency problem

Karol Podemski gecon@m@inten@nce @ending from gm@il@com
Sun Aug 26 21:26:10 CEST 2018

Dear Tomas, Inaki and the rest of R-devel team,

thank you for your explainations and suggestions. I talked with gEcon
development team and we decided to change our implementation along the
lines you suggested.

Best regards,
Karol Podemski

pt., 17 sie 2018 o 13:38 Tomas Kalibera <tomas.kalibera using gmail.com>

> Dear Karol,
> I don't understand the models behind these function, but I can tell that
> the code generated is very inefficient. The AST interpreter will be very
> inefficient performing each scalar computation with all boxing,
> allocations, function calls. The byte-code compiler removes some of the
> boxing and allocation. While it could certainly compile faster, it will
> always be taking long compiling such functions with so many commands: so
> many expressions to track, so many source references to map, for so little
> computation. The same code could be much more efficient if it used
> vectorized operations and loops. The compiler cannot infer the loops and
> vector operations from the code - it is not that smart and I doubt it could
> easily be for R, but such optimizations could certainly be done easily at a
> higher level, when optimizing computation within the model, not within R
> with all its complicated semantics. I think you could hope for ~100x
> speedups compared to current generated code running with R AST interpreter.
> So I think it might be worth thinking about writing an interpreter for the
> model (the generator would compute function values on the fly, without
> actually generating code). If that was too slow, it might pay off to
> generate some intermediate representation for the model that would be
> faster to interpret. If that was too hard, then perhaps generating the code
> from the model in a smarter way (use vector operations, loops). It is ok to
> do that opportunistically - only when possible. With compilers, this is
> normal, optimizations often take advantage of certain patterns in the code
> if they are present. If you had more specific questions how to optimize the
> code feel free to ask (also offline).
> Certainly I don't want the existence of the byte-code compiler to require
> you to switch from R to C/C++, that would be exactly the opposite of what
> the compiler is aiming for. If it turns out you really need a way to
> disable compilation of these generated functions (so they run much slower,
> but you don't have to wait for them to compile), we will provide it and
> using a hack/workaround it is already possible in existing versions of R,
> with all the drawbacks I mentioned previously.
> Best
> Tomas
> On 08/17/2018 12:43 AM, Karol Podemski wrote:
> Dear Thomas,
> thank you for prompt response and taking interest in this issue. I really
> appreciate your compiler project and efficiency gains in usual case. I am
> aware of limitations of interpreted languages too and because of that even
> when writing my first mail I had a hunch that it is not that easy to
> address this problem.  As you mentioned optimisation of compiler for
> handling non-standard code may be tricky and harmful for usual code. The
> question is if gEcon is the only package that may face the same issue
> because of compilation.
> The functions generated by gEcon are systems of non-linear equations
> defining the equilibrium of an economy (see
> http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf  if you
> want to learn a bit how we obtain it). The rows, you suggested to
> vectorise, are indeed vectorisable because they define equilibrium for
> similiar markets (e.g. production and sale of beverages and food) but do
> not have to be vectorisable in general case. So that not to delve into too
> much details I will stop here in description of how the equations
> originate. However, I would like to point that similiar large systems of
> linear equations may arise in other fields (
> https://en.wikipedia.org/wiki/Steady_state ) and there may be other
> packages that generate similar large systems (e.g. network problems like
> hydraulic networks). In that case, reports such as mine may help you to
> assess the scale of the problems.
> Thank you for suggestions for improvement in our approach, i am going to
> discuss them with other package developers.
> Regards,
> Karol Podemski
> pon., 13 sie 2018 o 18:02 Tomas Kalibera <tomas.kalibera using gmail.com>
> napisał(a):
>> Dear Karol,
>> thank you for the report. I can reproduce that the function from you
>> example takes very long to compile and I can see where most time is spent.
>> The compiler is itself written in R and requires a lot of resources for
>> large functions (foo() has over 16,000 lines of code, nearly 1 million of
>> instructions/operands, 45,000 constants). In particular a lot of time is
>> spent in garbage collection and in finding a unique set of constants. Some
>> optimizations of the compiler may be possible, but it is unlikely that
>> functions this large will compile fast any soon. For non-generated code, we
>> now have the byte-compilation on installation by default which at least
>> removes the compile overhead from runtime. Even though the compiler is
>> slow, it is important to keep in mind that in principle, with any compiler
>> there will be functions where compilation would not be improve performance
>> (when the compile time is included or not).
>> I think it is not a good idea to generate code for functions like foo()
>> in R (or any interpreted language). You say that R's byte-code compiler
>> produces code that runs 5-10x faster than when the function is interpreted
>> by the AST interpreter (uncompiled), which sounds like a good result, but I
>> believe that avoiding code generation would be much faster than that, apart
>> from drastically reducing code size and therefore compile time. The
>> generator of these functions has much more information than the compiler -
>> it could be turned into an interpreter of these functions and compute their
>> values on the fly.
>> A significant source of inefficiency of the generated code are
>> element-wise operations, such as
>> r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
>> ...
>> r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
>> (these could be vectorized, which would reduce code size and improve
>> interpretation speed; and make it somewhat readable). Most of the code
>> lines in the generated functions seem to be easily vectorizable.
>> Compilers and interpreters necessarily use some heuristics or optimize at
>> some code patterns. Optimizing for generated code may be tricky as it could
>> even harm performance of usual code. And, I would much rather optimize the
>> compiler for the usual code.
>> Indeed, a pragmatic solution requiring the least amount of work would be
>> to disable compilation of these generated functions. There is not a
>> documented way to do that and maybe we could add it (and technically it is
>> trivial), but I have been reluctant so far - in some cases, compilation
>> even of these functions may be beneficial - if the speedup is 5-10x and we
>> run very many times. But once the generated code included some pragma
>> preventing compilation, it won't be ever compiled. Also, the trade-offs may
>> change as the compiler evolves, perhaps not in this case, but in other
>> where such pragma may be used.
>> Well so the short answer would be that these functions should not be
>> generated in the first place. If it were too much work rewriting, perhaps
>> the generator could just be improved to produce vectorized operations.
>> Best
>> Tomas
>> On 12.8.2018 21:31, Karol Podemski wrote:
>>  Dear R team,
>> I am a co-author and maintainer of one of R packages distributed by R-forge
>> (gEcon). One of gEcon package users found a strange behaviour of package (R
>> froze for couple of minutes) and reported it to me. I traced the strange
>> behaviour to compiler package. I attach short demonstration of the problem
>> to this mail (demonstration makes use of compiler and tictoc packages only).
>> In short, the compiler package has problems in compiling large functions -
>> their compilation and execution may take much longer than direct execution
>> of an uncompiled function. Such functions are generated by gEcon package as
>> they describe steady state for economy.
>> I am curious if you are aware of such problems and plan to handle the
>> efficiency issues. On one of the boards I saw that there were efficiency
>> issues in rpart package but they have been resolved. Or would you advise to
>> turn off JIT on package load (package heavily uses such long functions
>> generated whenever a new model is created)?
>> Best regards,
>> Karol Podemski
>> ______________________________________________R-devel using r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-devel

	[[alternative HTML version deleted]]

More information about the R-devel mailing list