[R] a question of substitute

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Jan 11 02:08:36 CET 2007

The 'Right Thing' is for oneway.test() to allow a variable for the first 
argument, and I have altered it in R-patched and R-devel to do so. So if 
your students can make use of R-patched that would be the best solution. 
If not, perhaps you could make a copy of oneway.test from R-patched 
available to them.  Normally I would worry about namespace issues, but it 
seems unlikely they would matter here: if they did assignInNamespace is 
likely to work to insert the fix.

Grothendieck's suggestions are steps towards a morass: they may work in 
simple cases but can make more complicated ones worse (such as looking for 
'data' in the wrong place).  These model fitting functions have rather 
precise requirements for where they look for their components:

 	the environment of 'formula'
 	the environment of the caller

and that includes where they look for 'data'.  It is easy to use 
substitute or such to make a literal formula out of 'formula', but doing 
so changes its environment.  So one needs to either

(a) fix up an environment within which to evaluate the modified call that 
emulates the scoping rules or

(b) create a new 'data' that has references to all the variables needed, 
and just call the function with the new 'formula' and new 'data'.

At first sight model.frame() looks the way to do (b), but it is not, since 
if there are function calls in the formula (e.g. log()) the model frame 
includes the derived variables and not the original ones.  There are 
workarounds (e.g. in glmmPQL), like using all.vars, creating a formula 
from that, setting its environment to that of the original function and 
then calling model.frame.

This comes up often enough that I have contemplated adding a solution to 
(b) to the stats package.

Doing either of these right is really pretty complicated, and not 
something to dash off code in a fairly quick reply (or even to check that 
the code in glmmPQL was general enough to be applicable).

On Tue, 9 Jan 2007, Adrian Dusa wrote:

> On Tuesday 09 January 2007 15:41, Prof Brian Ripley wrote:
>> oneway.test expects a literal formula, not a variable containing a
>> formula.  The help page says
>>   formula: a formula of the form 'lhs ~ rhs' where 'lhs' gives the
>>            sample values and 'rhs' the corresponding groups.
>> Furthermore, if you had
>> foo.2 <- function() oneway.test(value ~ group)
>> it would still not work, as
>>      data: an optional matrix or data frame (or similar: see
>>            'model.frame') containing the variables in the formula
>>            'formula'.  By default the variables are taken from
>>            'environment(formula)'.
>> I could show you several complicated workarounds, but why do you want to
>> do this?
> Thank you for your reply. The data argument was exactly the next problem I
> faced. My workaround involves checking if(missing(data)) then uses different
> calls to oneway.test(). I am certainly interested in other solutions, this
> one is indeed limited.
> I do this for the students in the anova class, checking first the homogeneity
> of variances with fligner.test(), printing the p.value and based on that
> changing the var.equal argument in the oneway.test()
> It's just for convenience, but they do like having it all-in-one.
> Best regards,
> Adrian

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

More information about the R-help mailing list