[R] regex challenge

Frank Harrell f.harrell at Vanderbilt.Edu
Fri Aug 16 01:44:42 CEST 2013


I really appreciate the excellent ideas from Bill Dunlap and Greg Snow. 
  Both suggestions almost work perfectly.  Greg's recognizes expressions 
such as sex=='female' but not ones such as age > 21, age < 21, a - b > 
0, and possibly other legal R expressions.  Bill's idea is similar to 
what Duncan Murdoch suggested to me.  Bill's doesn't catch the case when 
a variable appears both in an expression and as a regular variable (sex 
in the example below):

f <- function(formula) {
   trms <- terms(formula)
   variables <- as.list(attr(trms, "variables"))[-1]
   ## the 'variables' attribute is stored as a call to list(),
   ## so we changed the call to a list and removed the first element
   ## to get the variables themselves.
   if (attr(trms, "response") == 1) {
     ## terms does not pull apart right hand side of formula,
     ## so we assume each non-function is to be renamed.
     responseVars <- lapply(all.vars(variables[[1]]), as.name)
     variables <- variables[-1]
   } else {
     responseVars <- list()
   }
   ## omit non-name variables from list of ones to change.
   ## This is where you could expand calls to certain functions.
   variables <- variables[vapply(variables, is.name, TRUE)]
   variables <- c(responseVars, variables) # all are names now
   names(variables) <- vapply(variables, as.character, "")
   newVars <- lapply(variables, function(v) as.name(paste0(toupper(v), 
"z")))
   formula(do.call("substitute", list(formula, newVars)), 
env=environment(formula))
}

a <- cat + (age + Heading("Females") * (sex == "Female") * sbp) *
     Heading() * g + (age + sbp) * Heading() * trio ~ Heading() *
     country * Heading() * sex
f(a)

Output:

CATz + (AGEz + Heading("Females") * (SEXz == "Female") * SBPz) *
     Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
     COUNTRYz * Heading() * SEXz

The method also doesn't work if I replace sex == 'Female' with x3 > 4, 
converting to X3z > 4.  I'm not clear on how to code what kind of 
expressions to ignore.

Thanks!
Frank



More information about the R-help mailing list