[Rd] Conventions: Use of globals and main functions

Cyclic Group Z_1 cyc||cgroup-z1 @end|ng |rom y@hoo@com
Sun Aug 25 06:08:39 CEST 2019


In R scripts (as opposed to packages), even in reproducible scripts, it seems fairly conventional to use the global workspace as a sort of main function, and thus R scripts often populate the global environment with many variables, which may be mutated. Although this makes sense given R has historically been used interactively and this practice is common for scripting languages, this appears to disagree with the software-engineering principle of avoiding a mutating global state. Although this is just a rule of thumb, in R scripts, the frequent use of global variables is much more pronounced than in other languages.

On the other hand, in Python, it is common to use a main function (through the `def main():` and  `if __name__ == "__main__":` idioms). This is mentioned both in the documentation as well as in the writing of Python's main creator. Although this is more beneficial in Python than in R because Python code is structured into modules, which serve as both scripts and packages, whereas R separates these conceptually, a similar practice of creating a main function would help avoid the issues from mutating global state common to other languages and facilitate maintainability, especially for longer scripts.

Although many great R texts (Advanced R, Art of R Programming, etc.) caution against assignment in a parent enclosure (e.g., using `<<-`, or `assign`), I have not seen many promote the use of a main function and avoiding mutating global variables from top level.

Would it be a good idea to promote use of main functions and limiting global-state mutation for longer scripts and dedicated applications (not one-off scripts)? Should these practices be mentioned in the standard documentation?

This question was motivated largely by this discussion on Reddit: https://www.reddit.com/r/rstats/comments/cp3kva/is_mutating_global_state_acceptable_in_r/ . Apologies beforehand if any of these (partially subjective) assessments are in error.

Best,
CG



More information about the R-devel mailing list