[R] regex challenge

William Dunlap wdunlap at tibco.com
Fri Aug 16 03:02:39 CEST 2013


Try this one

ff <- function (expr) 
{
    if (is.call(expr) && is.name(expr[[1]]) &&
         is.element(as.character(expr[[1]]),  c("~", "+", "-", "*", "/", ":", "%in%"))) {
        # the above list should cover the standard formula operators.
        for (i in seq_along(expr)[-1]) {
            expr[[i]] <- Recall(expr[[i]])
        }
    }
    else if (is.name(expr)) {
       # the conversion itself
        expr <- as.name(paste0(toupper(as.character(expr)), "z"))
    }
    expr
}

> ff(a)
CATz + (age + Heading("Females") * (sex == "Female") * sbp) * 
    Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() * 
    COUNTRYz * Heading() * SEXz

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Frank Harrell
> Sent: Thursday, August 15, 2013 4:45 PM
> To: RHELP
> Subject: Re: [R] regex challenge
> 
> I really appreciate the excellent ideas from Bill Dunlap and Greg Snow.
>   Both suggestions almost work perfectly.  Greg's recognizes expressions
> such as sex=='female' but not ones such as age > 21, age < 21, a - b >
> 0, and possibly other legal R expressions.  Bill's idea is similar to
> what Duncan Murdoch suggested to me.  Bill's doesn't catch the case when
> a variable appears both in an expression and as a regular variable (sex
> in the example below):
> 
> f <- function(formula) {
>    trms <- terms(formula)
>    variables <- as.list(attr(trms, "variables"))[-1]
>    ## the 'variables' attribute is stored as a call to list(),
>    ## so we changed the call to a list and removed the first element
>    ## to get the variables themselves.
>    if (attr(trms, "response") == 1) {
>      ## terms does not pull apart right hand side of formula,
>      ## so we assume each non-function is to be renamed.
>      responseVars <- lapply(all.vars(variables[[1]]), as.name)
>      variables <- variables[-1]
>    } else {
>      responseVars <- list()
>    }
>    ## omit non-name variables from list of ones to change.
>    ## This is where you could expand calls to certain functions.
>    variables <- variables[vapply(variables, is.name, TRUE)]
>    variables <- c(responseVars, variables) # all are names now
>    names(variables) <- vapply(variables, as.character, "")
>    newVars <- lapply(variables, function(v) as.name(paste0(toupper(v),
> "z")))
>    formula(do.call("substitute", list(formula, newVars)),
> env=environment(formula))
> }
> 
> a <- cat + (age + Heading("Females") * (sex == "Female") * sbp) *
>      Heading() * g + (age + sbp) * Heading() * trio ~ Heading() *
>      country * Heading() * sex
> f(a)
> 
> Output:
> 
> CATz + (AGEz + Heading("Females") * (SEXz == "Female") * SBPz) *
>      Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
>      COUNTRYz * Heading() * SEXz
> 
> The method also doesn't work if I replace sex == 'Female' with x3 > 4,
> converting to X3z > 4.  I'm not clear on how to code what kind of
> expressions to ignore.
> 
> Thanks!
> Frank
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list