[Rd] Rprof(): Revisit 'interval' limits?
Tomas Kalibera
tom@@@k@||ber@ @end|ng |rom gm@||@com
Thu Jul 17 18:59:43 CEST 2025
I would only use longer intervals than the allowed minimum to reduce the
observer bias: the profiling itself is quite an expensive operation
which alters the execution of the program and poses a source of bias to
the measurements. Instead, I would ensure the profiled application runs
for long enough (by repeating a kernel multiple times, using larger
input data, etc) to make sure there is enough samples. That should
provide better results and the default 20ms interval should be good for
that. It shouldn't matter how long the individual calls in the
application take - if even a very short running call is executed very
often, it should be visible in the profile.
The limit on Linux comes from the HZ value, which is the frequency at
which CPU time is updated. The default is 250 (so 4ms). Those 2+ years
ago when we set the limits, I've ran experiments on Linux and other
systems to see what the timers support and couldn't get below these 4ms
on Linux. On macOS I could get to much smaller intervals and also on
Windows (but there it is not CPU time profiling). Anyway, the results
one would get from profiling R code any close to those limits would most
likely be garbage.
The limits were introduced after a bug report from a user on macOS, who
set the intervals way too low, running into a race condition in macOS
system (by now worked-around in R) and then running into starvation -
when R couldn't make any progress running user code because it spent all
the time collecting samples.
The HZ value can be set at kernel configure time and if at some point
almost all kernels used by R users would have HZ=1000, we might
re-consider the limit for Linux (and possibly make it the same as on
other platforms, also to simplify the code). There was a proposal for
that to be the new default earlier this year, so maybe it will happen at
some point.
Best
Tomas
On 7/13/25 11:26, Henrik Bengtsson wrote:
> Rprof() has an argument `interval = 0.02` that controls how frequently
> sampling takes place. On Linux the maximum sampling frequency is once
> every 10 ms and on other platforms its once every 1 ms, per
> help("Rprof"):
>
> "What is feasible is machine-dependent. On Linux, R requires the
> interval to be at least 10ms, on all other platforms at least 1ms.
> Shorter intervals will be rounded up with a warning."
>
> implemented in <https://github.com/r-devel/r-svn/blob/eb498f735e6b592c3db53d6824be3a7c30d4c4d5/src/main/eval.c#L897-L907>:
>
> #if defined(linux) || defined(__linux__)
> if (dinterval < 0.01) {
> dinterval = 0.01;
> warning(_("interval too short for this platform, using '%f'"), dinterval);
> }
> #else
> if (dinterval < 0.001) {
> dinterval = 0.001;
> warning(_("interval too short, using '%f'"), dinterval);
> }
> #endif
>
> Q. These limits were introduced on 2022-11-18 (r83369) by Tomas K.
> How were these limits chosen? Is it that the Linux limit of 10 ms
> applies to all Linux distributions, kernels, and hardware, or was this
> limit picked to work on most systems? Do they need to be re-visited
> over time? I would imagine that the limit would depend on hardware and
> the speed on the file system that Rprof() writes too, but I find it a
> bit odd that it would be hardcoded to an absolute walltime period.
>
> FWIW, I just recompiled R-devel on my Ubuntu Linux laptop to allow for
> 1 ms, and the collected data look as what I'd expect also at this
> resolution. Without the tweak, a lot of profiled calls clocks in at 10
> ms.
>
> Thanks,
>
> Henrik
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list