[R] Playing with formulae

Ross Boylan ross at biostat.ucsf.edu
Sat Sep 13 00:47:18 CEST 2003

First, thanks to everyone for their responses to my programming style
question.  Second, I have some questions about some obscure corners of
the language.

f <- y~x+z
t <- terms(f).

I want to do some manipulations of the formula that require getting the
names of variables as character strings (e.g., for indexing into a
dataset).  However, t, or even attr(t, "variables"), does not provide
character strings.

1. Does all.vars(f) reliably produce the same ordering as t?

2. Can objects of class name (which I notice appear in places in t) be
used the same way as character strings (e.g., indexing columns in a data
set, arguments to match)?  (This would matter if I could pull t apart
reliably.  I can't.  See 3b for more on that problem.)

3. t's response attribute is said to be an index of the response
variable in variables (I presume this means the variables attribute).
  a) Will all.vars(f)[attr(t, "response")] reliably get me the character
     string for the name of the response variable?
  b) How can I get the response variable out of the "variables"
     attribute?  In my example,
     response is 1, but attr(t, "variables")[1] is list().
     Possible answer: attr(t, "variables")[[response+1]] looks right,
     and is of class name.  Hence the interest in question 2.

4. Is the actual number of coefficients the model will need
length(attr(t,"term.labels"))+attr(t, "intercept"),
regardless of interactions or I() terms?
(My interest is primarily in detecting the wrong number of terms, for
example if someone specifies an interaction).

5. The documentation for terms.formula appears to imply that if there is
a simple formula without interactions I will get coefficient estimates
in the same order that the original formula specified textually.  Right?
I'm concerned about this because I'm having a vector of simulation
coefficients passed in along with the formula, and I need to be sure
they line up with the model terms.

I know that's a lot of pretty detailed questions, but if you can offer
any help that would be great.  I've tried some simple tests that seem to
work, but of course those don't prove that the assumptions always hold. 
And the documentation does not seem to resolve the issues either.

I'm using R 1.7.1, but ideally my code will not be version-specific.

More information about the R-help mailing list