[Rd] Package compiler - efficiency problem

Iñaki Ucar iuc@r @ending from fedor@project@org
Fri Aug 17 02:20:50 CEST 2018


Karol,

If I understood correctly, functions like "foo" are automatically
generated by gEcon's model parser. For such a long function, and
depending on how many times you need to call it, it may make more
sense to generate C++ code instead (including the 'for' loop). Then
you can use Rcpp::sourceCpp, or Rcpp::cppFunction, to compile it and
run it from R.

Iñaki

El vie., 17 ago. 2018 a las 0:47, Karol Podemski
(<gecon.maintenance using gmail.com>) escribió:
>
> Dear Thomas,
>
> thank you for prompt response and taking interest in this issue. I really
> appreciate your compiler project and efficiency gains in usual case. I am
> aware of limitations of interpreted languages too and because of that even
> when writing my first mail I had a hunch that it is not that easy to
> address this problem.  As you mentioned optimisation of compiler for
> handling non-standard code may be tricky and harmful for usual code. The
> question is if gEcon is the only package that may face the same issue
> because of compilation.
>
> The functions generated by gEcon are systems of non-linear equations
> defining the equilibrium of an economy (see
> http://gecon.r-forge.r-project.org/files/gEcon-users-guide.pdf  if you want
> to learn a bit how we obtain it). The rows, you suggested to vectorise, are
> indeed vectorisable because they define equilibrium for similiar markets
> (e.g. production and sale of beverages and food) but do not have to be
> vectorisable in general case. So that not to delve into too much details I
> will stop here in description of how the equations originate. However, I
> would like to point that similiar large systems of linear equations may
> arise in other fields ( https://en.wikipedia.org/wiki/Steady_state ) and
> there may be other packages that generate similar large systems (e.g.
> network problems like hydraulic networks). In that case, reports such as
> mine may help you to assess the scale of the problems.
>
> Thank you for suggestions for improvement in our approach, i am going to
> discuss them with other package developers.
>
> Regards,
> Karol Podemski
>
> pon., 13 sie 2018 o 18:02 Tomas Kalibera <tomas.kalibera using gmail.com>
> napisał(a):
>
> > Dear Karol,
> >
> > thank you for the report. I can reproduce that the function from you
> > example takes very long to compile and I can see where most time is spent.
> > The compiler is itself written in R and requires a lot of resources for
> > large functions (foo() has over 16,000 lines of code, nearly 1 million of
> > instructions/operands, 45,000 constants). In particular a lot of time is
> > spent in garbage collection and in finding a unique set of constants. Some
> > optimizations of the compiler may be possible, but it is unlikely that
> > functions this large will compile fast any soon. For non-generated code, we
> > now have the byte-compilation on installation by default which at least
> > removes the compile overhead from runtime. Even though the compiler is
> > slow, it is important to keep in mind that in principle, with any compiler
> > there will be functions where compilation would not be improve performance
> > (when the compile time is included or not).
> >
> > I think it is not a good idea to generate code for functions like foo() in
> > R (or any interpreted language). You say that R's byte-code compiler
> > produces code that runs 5-10x faster than when the function is interpreted
> > by the AST interpreter (uncompiled), which sounds like a good result, but I
> > believe that avoiding code generation would be much faster than that, apart
> > from drastically reducing code size and therefore compile time. The
> > generator of these functions has much more information than the compiler -
> > it could be turned into an interpreter of these functions and compute their
> > values on the fly.
> >
> > A significant source of inefficiency of the generated code are
> > element-wise operations, such as
> >
> > r[12] <- -vv[88] + vv[16] * (1 + ppff[1307])
> > ...
> >
> > r[139] <- -vv[215] + vv[47] * (1 + ppff[1434])
> >
> > (these could be vectorized, which would reduce code size and improve
> > interpretation speed; and make it somewhat readable). Most of the code
> > lines in the generated functions seem to be easily vectorizable.
> >
> > Compilers and interpreters necessarily use some heuristics or optimize at
> > some code patterns. Optimizing for generated code may be tricky as it could
> > even harm performance of usual code. And, I would much rather optimize the
> > compiler for the usual code.
> >
> > Indeed, a pragmatic solution requiring the least amount of work would be
> > to disable compilation of these generated functions. There is not a
> > documented way to do that and maybe we could add it (and technically it is
> > trivial), but I have been reluctant so far - in some cases, compilation
> > even of these functions may be beneficial - if the speedup is 5-10x and we
> > run very many times. But once the generated code included some pragma
> > preventing compilation, it won't be ever compiled. Also, the trade-offs may
> > change as the compiler evolves, perhaps not in this case, but in other
> > where such pragma may be used.
> >
> > Well so the short answer would be that these functions should not be
> > generated in the first place. If it were too much work rewriting, perhaps
> > the generator could just be improved to produce vectorized operations.
> >
> > Best
> > Tomas
> > On 12.8.2018 21:31, Karol Podemski wrote:
> >
> >  Dear R team,
> >
> > I am a co-author and maintainer of one of R packages distributed by R-forge
> > (gEcon). One of gEcon package users found a strange behaviour of package (R
> > froze for couple of minutes) and reported it to me. I traced the strange
> > behaviour to compiler package. I attach short demonstration of the problem
> > to this mail (demonstration makes use of compiler and tictoc packages only).
> >
> > In short, the compiler package has problems in compiling large functions -
> > their compilation and execution may take much longer than direct execution
> > of an uncompiled function. Such functions are generated by gEcon package as
> > they describe steady state for economy.
> >
> > I am curious if you are aware of such problems and plan to handle the
> > efficiency issues. On one of the boards I saw that there were efficiency
> > issues in rpart package but they have been resolved. Or would you advise to
> > turn off JIT on package load (package heavily uses such long functions
> > generated whenever a new model is created)?
> >
> > Best regards,
> > Karol Podemski
> >
> >
> >
> > ______________________________________________R-devel using r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Ucar



More information about the R-devel mailing list