[R] trouble automating formula edits when log or * are present; update trouble

Paul Johnson pauljohn32 at gmail.com
Tue May 29 17:43:01 CEST 2012


Greetings

I want to take a fitted regression and replace all uses of a variable
in a formula. For example, I'd like to take

m1 <- lm(y ~ x1, data=dat)

and replace x1 with something else, say x1c, so the formula would become

m1 <- lm(y ~ x1c, data=dat)

I have working code to finish that part of the problem, but it fails
when the formula is more complicated. If the formula has log(x1) or
x1:x2, the update code I'm testing doesn't get right.

Here's the test code:

##PJ
## 2012-05-29
dat <- data.frame(x1=rnorm(100,m=50), x2=rnorm(100,m=50),
x3=rnorm(100,m=50), y=rnorm(100))

m1 <- lm(y ~ log(x1) + x1 + sin(x2) + x2 + exp(x3), data=dat)
m2 <- lm(y ~ log(x1) + x2*x3, data=dat)

suffixX <- function(fmla, x, s){
    upform <- as.formula(paste0(". ~ .", "-", x, "+", paste0(x, s)))
    update.formula(fmla, upform)
}

newFmla <- formula(m2)
newFmla
suffixX(newFmla, "x2", "c")
suffixX(newFmla, "x1", "c")

The last few lines of the output. See how the update misses x1 inside
log(x1) or in the interaction?


> newFmla <- formula(m2)
> newFmla
y ~ log(x1) + x2 * x3
> suffixX(newFmla, "x2", "c")
y ~ log(x1) + x3 + x2c + x2:x3
> suffixX(newFmla, "x1", "c")
y ~ log(x1) + x2 + x3 + x1c + x2:x3

It gets the target if the target is all by itself, but not otherwise.

After messing with this for quite a while, I conclude that update was
the wrong way to go because it is geared to replacement of individual
bits, not editing all instances of a thing.

So I started studying the structure of formula objects.  I noticed
this really interesting thing. the newFmla object can be probed
recursively to eventually reveal all of the individual pieces:


> newFmla
y ~ log(x1) + x2 * x3
> newFmla[[3]]
log(x1) + x2 * x3
> newFmla[[3]][[2]]
log(x1)
> newFmla[[3]][[2]][[2]]
x1

So, if you could tell me of a general way to "walk" though a formula
object, couldn't I use "gsub" or something like that to recognize each
instance of "x1" and replace with "x1c"??

I just can't figure how to automate the checking of each possible
element in a formula, to get the right combination of [[]][[]][[]].
See what I mean? I need to avoid this:

> newFmla[[3]][[2]][[3]]
Error in newFmla[[3]][[2]][[3]] : subscript out of bounds

pj

-- 
Paul E. Johnson
Professor, Political Science    Assoc. Director
1541 Lilac Lane, Room 504     Center for Research Methods
University of Kansas               University of Kansas
http://pj.freefaculty.org            http://quant.ku.edu



More information about the R-help mailing list