[R] regex challenge

Frank Harrell f.harrell at Vanderbilt.Edu
Fri Aug 16 19:06:57 CEST 2013


Thanks Bill.  The problem is one of the results of convertName might be 
'Heading("Age in Years")*age'  (this is for the tables package), and 
as.name converts this to `Heading("...")*age` and the backticks cause 
the final formula to have a mixture of regular elements and ` ` quoted 
expression elements, making the formula invalid.
Best,
Frank
-------------------------------------------------------------------

The following makes the name converter function an argument to ff (and 
restores the colon operator to the list of formula operators), but I'm 
not sure what you need the converter to do.

ff <- function(expr, convertName = function(name)paste0(toupper(name), 
"z")) {
     if (is.call(expr) && is.name(expr[[1]]) && 
is.element(as.character(expr[[1]]), c("~","+","-","*","/","%in%","(", 
":"))) {
         for(i in seq_along(expr)[-1]) {
             expr[[i]] <- Recall(expr[[i]], convertName = convertName)
         }
     } else if (is.name(expr)) {
         expr <- as.name(convertName(expr))
     }
     expr
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 > -----Original Message-----
 > From: [hidden email] [mailto:[hidden email]] On Behalf
 > Of Frank Harrell
 > Sent: Thursday, August 15, 2013 7:47 PM
 > To: RHELP
 > Subject: Re: [R] regex challenge
 >
 > Bill that is very impresive.  The only problem I'm having is that I want
 > the paste0(toupper(...)) to be a general function that returns a
 > character string that is a legal part of a formula object that can't be
 > converted to a 'name'.
 >
 > Frank
 >
 >
 > -------------------------------
 > Oops, I left "(" out of the list of operators.
 >
 >
 > ff <- function(expr) {
 >      if (is.call(expr) && is.name(expr[[1]]) &&
 >           is.element(as.character(expr[[1]]),
 > c("~","+","-","*","/","%in%","("))) {
 >          for(i in seq_along(expr)[-1]) {
 >              expr[[i]] <- Recall(expr[[i]])
 >          }
 >      } else if (is.name(expr)) {
 >          expr <- as.name(paste0(toupper(as.character(expr)), "z"))
 >      }
 >      expr
 > }
 >
 >  > ff(a)
 > CATz + (AGEz + Heading("Females") * (sex == "Female") * SBPz) *
 >      Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
 >      COUNTRYz * Heading() * SEXz
 >
 > Bill Dunlap
 > Spotfire, TIBCO Software
 > wdunlap tibco.com
 >
 >
 >  > -----Original Message-----
 >  > From: [hidden email] [mailto:[hidden email]] On Behalf
 >  > Of William Dunlap
 >  > Sent: Thursday, August 15, 2013 6:03 PM
 >  > To: Frank Harrell; RHELP
 >  > Subject: Re: [R] regex challenge
 >  >
 >  > Try this one
 >  >
 >  > ff <- function (expr)
 >  > {
 >  >     if (is.call(expr) && is.name(expr[[1]]) &&
 >  >          is.element(as.character(expr[[1]]),  c("~", "+", "-", "*",
 > "/", ":", "%in%"))) {
 >  >         # the above list should cover the standard formula operators.
 >  >         for (i in seq_along(expr)[-1]) {
 >  >             expr[[i]] <- Recall(expr[[i]])
 >  >         }
 >  >     }
 >  >     else if (is.name(expr)) {
 >  >        # the conversion itself
 >  >         expr <- as.name(paste0(toupper(as.character(expr)), "z"))
 >  >     }
 >  >     expr
 >  > }
 >  >
 >  > > ff(a)
 >  > CATz + (age + Heading("Females") * (sex == "Female") * sbp) *
 >  >     Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() *
 >  >     COUNTRYz * Heading() * SEXz
 >  >
 >  > Bill Dunlap
 >  > Spotfire, TIBCO Software
 >  > wdunlap tibco.com
 >  >
 >  >
 >  > > -----Original Message-----
 >  > > From: [hidden email] [mailto:[hidden email]] On Behalf
 >  > > Of Frank Harrell
 >  > > Sent: Thursday, August 15, 2013 4:45 PM
 >  > > To: RHELP
 >  > > Subject: Re: [R] regex challenge
 >  > >
 >  > > I really appreciate the excellent ideas from Bill Dunlap and Greg
 > Snow.
 >  > >   Both suggestions almost work perfectly.  Greg's recognizes
 > expressions
 >  > > such as sex=='female' but not ones such as age > 21, age < 21, a 
- b >
 >  > > 0, and possibly other legal R expressions.  Bill's idea is 
similar to
 >  > > what Duncan Murdoch suggested to me.  Bill's doesn't catch the case
 > when
 >  > > a variable appears both in an expression and as a regular variable
 > (sex
 >  > > in the example below):
 >  > >
 >  > > f <- function(formula) {
 >  > >    trms <- terms(formula)
 >  > >    variables <- as.list(attr(trms, "variables"))[-1]
 >  > >    ## the 'variables' attribute is stored as a call to list(),
 >  > >    ## so we changed the call to a list and removed the first 
element
 >  > >    ## to get the variables themselves.
 >  > >    if (attr(trms, "response") == 1) {
 >  > >      ## terms does not pull apart right hand side of formula,
 >  > >      ## so we assume each non-function is to be renamed.
 >  > >      responseVars <- lapply(all.vars(variables[[1]]), as.name)
 >  > >      variables <- variables[-1]
 >  > >    } else {
 >  > >      responseVars <- list()
 >  > >    }
 >  > >    ## omit non-name variables from list of ones to change.
 >  > >    ## This is where you could expand calls to certain functions.
 >  > >    variables <- variables[vapply(variables, is.name, TRUE)]
 >  > >    variables <- c(responseVars, variables) # all are names now
 >  > >    names(variables) <- vapply(variables, as.character, "")
 >  > >    newVars <- lapply(variables, function(v) 
as.name(paste0(toupper(v),
 >  > > "z")))
 >  > >    formula(do.call("substitute", list(formula, newVars)),
 >  > > env=environment(formula))
 >  > > }
 >  > >
 >  > > a <- cat + (age + Heading("Females") * (sex == "Female") * sbp) *
 >  > >      Heading() * g + (age + sbp) * Heading() * trio ~ Heading() *
 >  > >      country * Heading() * sex
 >  > > f(a)
 >  > >
 >  > > Output:
 >  > >
 >  > > CATz + (AGEz + Heading("Females") * (SEXz == "Female") * SBPz) *
 >  > >      Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ 
Heading() *
 >  > >      COUNTRYz * Heading() * SEXz
 >  > >
 >  > > The method also doesn't work if I replace sex == 'Female' with 
x3 > 4,
 >  > > converting to X3z > 4.  I'm not clear on how to code what kind of
 >  > > expressions to ignore.
 >  > >
 >  > > Thanks!
 >  > > Frank
 >  > >
 >  > > ______________________________________________
 >  > > [hidden email] mailing list
 >  > > https://stat.ethz.ch/mailman/listinfo/r-help
 >  > > PLEASE do read the posting guide
 > http://www.R-project.org/posting-guide.html
 >  > > and provide commented, minimal, self-contained, reproducible code.
 >  >
 >  > ______________________________________________
 >  > [hidden email] mailing list
 >  > https://stat.ethz.ch/mailman/listinfo/r-help
 >  > PLEASE do read the posting guide
 > http://www.R-project.org/posting-guide.html
 >  > and provide commented, minimal, self-contained, reproducible code.
 > ... [show rest of quote]



More information about the R-help mailing list