[R] regex challenge

Frank Harrell f.harrell at Vanderbilt.Edu
Fri Aug 16 04:46:59 CEST 2013


Bill that is very impresive.  The only problem I'm having is that I want 
the paste0(toupper(...)) to be a general function that returns a 
character string that is a legal part of a formula object that can't be 
converted to a 'name'.

Frank


-------------------------------
Oops, I left "(" out of the list of operators.


ff <- function(expr) {
     if (is.call(expr) && is.name(expr[[1]]) &&
          is.element(as.character(expr[[1]]), 
c("~","+","-","*","/","%in%","("))) {
         for(i in seq_along(expr)[-1]) {
             expr[[i]] <- Recall(expr[[i]])
         }
     } else if (is.name(expr)) {
         expr <- as.name(paste0(toupper(as.character(expr)), "z"))
     }
     expr
}

 > ff(a)
CATz + (AGEz + Heading("Females") * (sex == "Female") * SBPz) *
     Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
     COUNTRYz * Heading() * SEXz

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 > -----Original Message-----
 > From: [hidden email] [mailto:[hidden email]] On Behalf
 > Of William Dunlap
 > Sent: Thursday, August 15, 2013 6:03 PM
 > To: Frank Harrell; RHELP
 > Subject: Re: [R] regex challenge
 >
 > Try this one
 >
 > ff <- function (expr)
 > {
 >     if (is.call(expr) && is.name(expr[[1]]) &&
 >          is.element(as.character(expr[[1]]),  c("~", "+", "-", "*", 
"/", ":", "%in%"))) {
 >         # the above list should cover the standard formula operators.
 >         for (i in seq_along(expr)[-1]) {
 >             expr[[i]] <- Recall(expr[[i]])
 >         }
 >     }
 >     else if (is.name(expr)) {
 >        # the conversion itself
 >         expr <- as.name(paste0(toupper(as.character(expr)), "z"))
 >     }
 >     expr
 > }
 >
 > > ff(a)
 > CATz + (age + Heading("Females") * (sex == "Female") * sbp) *
 >     Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() *
 >     COUNTRYz * Heading() * SEXz
 >
 > Bill Dunlap
 > Spotfire, TIBCO Software
 > wdunlap tibco.com
 >
 >
 > > -----Original Message-----
 > > From: [hidden email] [mailto:[hidden email]] On Behalf
 > > Of Frank Harrell
 > > Sent: Thursday, August 15, 2013 4:45 PM
 > > To: RHELP
 > > Subject: Re: [R] regex challenge
 > >
 > > I really appreciate the excellent ideas from Bill Dunlap and Greg 
Snow.
 > >   Both suggestions almost work perfectly.  Greg's recognizes 
expressions
 > > such as sex=='female' but not ones such as age > 21, age < 21, a - b >
 > > 0, and possibly other legal R expressions.  Bill's idea is similar to
 > > what Duncan Murdoch suggested to me.  Bill's doesn't catch the case 
when
 > > a variable appears both in an expression and as a regular variable 
(sex
 > > in the example below):
 > >
 > > f <- function(formula) {
 > >    trms <- terms(formula)
 > >    variables <- as.list(attr(trms, "variables"))[-1]
 > >    ## the 'variables' attribute is stored as a call to list(),
 > >    ## so we changed the call to a list and removed the first element
 > >    ## to get the variables themselves.
 > >    if (attr(trms, "response") == 1) {
 > >      ## terms does not pull apart right hand side of formula,
 > >      ## so we assume each non-function is to be renamed.
 > >      responseVars <- lapply(all.vars(variables[[1]]), as.name)
 > >      variables <- variables[-1]
 > >    } else {
 > >      responseVars <- list()
 > >    }
 > >    ## omit non-name variables from list of ones to change.
 > >    ## This is where you could expand calls to certain functions.
 > >    variables <- variables[vapply(variables, is.name, TRUE)]
 > >    variables <- c(responseVars, variables) # all are names now
 > >    names(variables) <- vapply(variables, as.character, "")
 > >    newVars <- lapply(variables, function(v) as.name(paste0(toupper(v),
 > > "z")))
 > >    formula(do.call("substitute", list(formula, newVars)),
 > > env=environment(formula))
 > > }
 > >
 > > a <- cat + (age + Heading("Females") * (sex == "Female") * sbp) *
 > >      Heading() * g + (age + sbp) * Heading() * trio ~ Heading() *
 > >      country * Heading() * sex
 > > f(a)
 > >
 > > Output:
 > >
 > > CATz + (AGEz + Heading("Females") * (SEXz == "Female") * SBPz) *
 > >      Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
 > >      COUNTRYz * Heading() * SEXz
 > >
 > > The method also doesn't work if I replace sex == 'Female' with x3 > 4,
 > > converting to X3z > 4.  I'm not clear on how to code what kind of
 > > expressions to ignore.
 > >
 > > Thanks!
 > > Frank
 > >
 > > ______________________________________________
 > > [hidden email] mailing list
 > > https://stat.ethz.ch/mailman/listinfo/r-help
 > > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 > > and provide commented, minimal, self-contained, reproducible code.
 >
 > ______________________________________________
 > [hidden email] mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 > and provide commented, minimal, self-contained, reproducible code.
... [show rest of quote]

-- 
Frank E Harrell Jr Professor and Chairman      School of Medicine
                    Department of Biostatistics Vanderbilt University



More information about the R-help mailing list