[R] regex challenge

William Dunlap wdunlap at tibco.com
Thu Aug 15 16:48:13 CEST 2013


I think substitute() or  bquote() will do a better job here than gsub() be
they work on the parsed formula rather than on the raw string.  The
terms() function will interpret the formula-specific operators like "+"
and ":" to come up with a list of the 'variables' (or 'terms') in the formula 
E.g., with the 'f' given below we get

> f(y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i)
Y1z + Y2z ~ Az * (Bz + Cz) + Dz + Fz * (h == 3) + (sex == "male") * Iz

Is that what you wanted?

If you only wanted to keep intact the expressions of the form
  var==value
(calls to `==`) but transform things like log(a) to log(Az) you
could extend this code to do that as well.

f <- function(formula) {
   trms <- terms(formula)
   variables <- as.list(attr(trms, "variables"))[-1]
   # the 'variables' attribute is stored as a call to list(),
   # so we changed the call to a list and removed the first element
   # to get the variables themselves.
   if (attr(trms, "response") == 1) {
       # terms does not pull apart right hand side of formula,
       # so we assume each non-function is to be renamed.
       responseVars <- lapply(all.vars(variables[[1]]), as.name)
       variables <- variables[-1]
   } else {
       responseVars <- list()
   }
   # omit non-name variables from list of ones to change.
   # This is where you could expand calls to certain functions.
   variables <- variables[vapply(variables, is.name, TRUE)]
   variables <- c(responseVars, variables) # all are names now
   names(variables) <- vapply(variables, as.character, "")
   newVars <- lapply(variables, function(v) as.name(paste0(toupper(v), "z")))
   formula(do.call("substitute", list(formula, newVars)), env=environment(formula))
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Frank Harrell
> Sent: Wednesday, August 14, 2013 8:14 PM
> To: RHELP
> Subject: [R] regex challenge
> 
> I would like to be able to use gsub or gsubfn to process a formula and
> to translate the variables but to ignore expressions in the formula.
> Supposing that the R formula has already been transformed into a
> character string and that the transformation is to convert variable
> names to upper case and to append z to the names, an example would be to
> convert y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i to
> Y1z + Y2z ~ Az*(Bz + Cz) + Dz + Fz * (h == 3) + (sex == 'male')*Iz.  Any
> expression that is not just a simple variable name would be left alone.
> 
> Does anyone want to try their hand at creating a regex that would
> accomplish this?
> 
> Thanks
> Frank
> --
> Frank E Harrell Jr Professor and Chairman      School of Medicine
>                     Department of Biostatistics Vanderbilt University
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list