[Rd] Display lines of code from the top-level script or subscript in non-interactive R Session with Rprof

Alexander Keth @|ex@nder@keth @end|ng |rom |venturegroup@com
Wed Aug 3 10:44:55 CEST 2022


Hello there,


I am running R in a production environment. My goal is to profile all production jobs, which are run in non interactive R sessions via Rscript, in the form job-xyz ran for xxx amount of time and spend yyy seconds with code execution of line # (for every line of code). In general the R code is run with a main script which calls various subscripts. The jobs make heays use of external packages (e.g. dplyr, DBI, data.table and so on).

I re-installed all packages with --with-keep.source. Subscripts are sourced in the main-script via eval(parse("path/to/subscript.R")) to enable line-profiling with Rprof. The call to Rprof is Rprof("rprof.out", line.profiling = TRUE, memory.profiling = TRUE).


Unfotunately, the majority of the code relies on heavy package use (e.g. dplyr, data.table and so on). Thus most of the code lines in Rprof refer to the source-code within those packages and not the 'top-level' source code in the main-script or the subscripts. So far the only solution I came up with is to scrape the Rprof output using the profile package (https://github.com/r-prof/profile), extract the top-level call stack function calls (remove top level eval calls before) and auto-magically match the function calls with the function calls performed in the main-script and subscripts. However, this process is obviously not perfect and very error prone...


Is there any better way to do things?


Cheers,
Alex



More information about the R-devel mailing list