[R] Preexisting Work on Data- and Control-Flow Analysis

Richard O'Keefe r@oknz @end|ng |rom gm@||@com
Thu Dec 8 06:15:36 CET 2022


You should probably look at the compiler.
One issue with data and control flow analysis in R is that
   f <- function (x, y) x + y
   f(ping, pong)
may invoke an S3 (see ?S3groupGeneric, Ops) or S4 (see ?Arith)
method, which might not have existed when f was analysed.
Indeed,
   f <- function (x, y) { foo(x); bar(y); x + y }
may change the method(s) bound to + in foo and/or bar, so the
bindings may change while f is running.

Then there's the whole "imperative programming with lazy
function argument evaluation" thing, which is definitely going
to make analysis a wee bit challenging.

And then there's this little gem:
> x <- 17
> f <- function () {
+   cat(x, "\n")
+   x <- 42
+   cat(x, "\n")
+ }
> f()
The first occurrence of x in f and the last occurrence of
x in f refer to *different* variables.  The last occurrence
of x refers to a local variable, but that local variable
did not exist until it was assigned to.  (Yep, the set of
local variables of a function is *dynamic*.)

While there are oh so many ways that R can make life horrible
for analysis, even R programmers don't go out of their way to
make life difficult for themselves.  It will probably be good
enough to define a "sane" subset of R, and a tool that reports
that an R function is outside that subset will be useful in its
own right because most of the time it won't be intentional.

I set out to write an R compiler 20+ years ago.  I filled
several exercise books with issues.  I suggest you start by
considering the question "what variables are in the environment
at this control point."  Start with ?get, ?assign, ?exists, ?rm.

On Wed, 7 Dec 2022 at 05:20, Florian Sihler <florian.sihler using uni-ulm.de>
wrote:

> Hello R-Help Mailinglist,
>
> I hope I've found the correct mailing list for my question (if not,
> please point me to the correct one).
> For my master's thesis I plan on creating and implementing a
> program-slicing algorithm for R-Programs using (probably only static)
> data- and control-flow analysis.
> While researching the problem I was unable to find any preexisting work
> on the matter.
> Does anyone here know of any preexisting work on data- and control-flow
> analysis (or even program slicing) in the context of R-Programs?
> I would be really glad for any pointer in the right direction (or
> reasons for why doing that would be a stupid idea).
>
> Regarding my background: I am a computer science student and usually
> program in C++, Java, TypeScript, and Haskell.
> Although I've worked with R for roughly a year now (mostly in my spare
> time), I am still getting used to some constructs.
>
> Thank you,
> Florian
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list