[Rd] Best style to organize code, namespaces

Tue Feb 23 04:05:10 CET 2010

On 22/02/2010 9:49 PM, Ben wrote:
> Hi all,
> 
> I'm hoping someone could tell me what best practices are as far as
> keeping programs organized in R.  In most languages, I like to keep
> things organized by writing small functions.  So, suppose I want to
> write a function that would require helper functions or would just be
> too big to write in one piece.  Below are three ways to do this:
> 
> 
> ################### Style 1 (C-style) ###############
> Foo <- function(x) {
>   ....
> }
> Foo.subf <- function(x, blah) {
>   ....
> }
> Foo.subg <- function(x, bar) {
>   ....
> }
> 
> ################### Style 2 (Lispish?) ##############
> Foo <- function(x) {
>   Subf <- function(blah) {
>     ....
>   }
>   Subg <- function(bar) {
>     ....
>   }
>   ....
> }
> 
> ################### Object-Oriented #################
> Foo <- function(x) {
>   Subf <- function(blah) {
>     ....
>   }
>   Subg <- function(bar) {
>     ....
>   }
>   Main <- function() {
>     ....
>   }
>   return(list(subf=subf, subg=subg, foo=foo))
> }
> ################### End examples ####################
> 
> Which of these ways is best?  Style 2 seems at first to be the most
> natural in R, but I found there are some major drawbacks.  First, it
> is hard to debug.  For instance, if I want to debug Subf, I need to
> first "debug(Foo)" and then while Foo is debugging, type
> "debug(Subf)".  

You can use setBreakpoint to set a breakpoint in the nested functions, 
and it will exist in all invocations of Foo (which each create new 
instances of the nested functions).  debug() is not the only debugging tool.

Another big limitation is that I can't write
> test-cases (e.g. using RUnit) for Subf and Subg because they aren't
> visible in any way at the global level.
> 
> For these reasons, style 1 seems to be better than style 2, if less
> elegant.  However, style 1 can get awkward because any parameters
> passed to the main function are not visible to the others.  In the
> above case, the value of "x" must be passed to Foo.subf and Foo.subg
> explicitly.  Also there is no enforcement of code isolation
> (i.e. anyone can call Foo.subf).
> 
> Style 3 is more explicitly object oriented.  It has the advantage of
> style 2 in that you don't need to pass x around, and the advantage of
> style 1 in that you can still write tests and easily debug the
> subfunctions.  However to actually call the main function you have to
> type "Foo(x)$Main()" instead of "Foo(x)", or else write a wrapper
> function for this.  Either way there is more typing.
> 
> So anyway, what is the best way to handle this?  R does not seem to
> have a good way of managing namespaces or avoiding collisions, like a
> module system or explicit object-orientation. 

Packages are self-contained modules.  You don't get collisions between 
names of locals between packages, and if they export the same name, 
other packages can explicitly select which export to use.

  How should we get
> around this limitation?  I've looked at sample R code in the
> distribution and elsewhere, but so far it's been pretty
> disappointing---most people seem to write very long, hard to
> understand functions.

I would normally use a mixture of styles 1 and 2.  Use style 2 for 
functions that really do need access to Foo locals, and use style 1 for 
self-contained functions.

Duncan Murdoch