[R] looking for formula parser that allows coefficients

Fox, John j|ox @end|ng |rom mcm@@ter@c@
Wed Aug 22 01:42:04 CEST 2018


Dear Paul,

Is it possible that you're overthinking this? That is, to you really need an R model formula or just want to evaluate an arithmetic expression using the columns of X?

If the latter, the following approach may work for you:

> evalFormula <- function(X, expr){
+   if (is.null(colnames(X))) colnames(X) <- paste0("x", 1:ncol(X))
+   with(as.data.frame(X), eval(parse(text=expr)))
+ }

> X <- matrix(1:20, 5, 4)
> X
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20

> evalFormula(X, '2 + 3*x1 + 4*x2 + 5*x3 + 6*x1*x2')
[1] 120 180 252 336 432

I hope that this helps,
 John

-----------------------------------------------------------------
John Fox
Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: https://socialsciences.mcmaster.ca/jfox/



> -----Original Message-----
> From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of Paul
> Johnson
> Sent: Tuesday, August 21, 2018 6:46 PM
> To: R-help <r-help using r-project.org>
> Subject: [R] looking for formula parser that allows coefficients
> 
> Can you point me at any packages that allow users to write a formula with
> coefficients?
> 
> I want to write a data simulator that has a matrix X with lots of columns, and
> then users can generate predictive models by entering a formula that uses
> some of the variables, allowing interactions, like
> 
> y ~ 2 + 1.1 * x1 + 3 * x3 + 0.1 * x1:x3 + 0.2 * x2:x2
> 
> Currently, in the rockchalk package, I have a function simulates data
> (genCorrelatedData2), but my interface to enter the beta coefficients is poor.
> I assumed user would always enter 0's as place holder for the unused
> coefficients, and the intercept is always first. The unnamed vector is too
> confusing.  I have them specify:
> 
> c(2, 1.1, 0, 3, 0, 0, 0.2, ...)
> 
> I the documentation I say (ridiculously) it is easy to figure out from the
> examples, but it really isnt.
> It function prints out the equation it thinks you intended, thats minimum
> protection against user error, but still not very good:
> 
> dat <- genCorrelatedData2(N = 10, rho = 0.0,
>           beta = c(1, 2, 1, 1, 0, 0.2, 0, 0, 0),
>           means = c(0,0,0), sds = c(1,1,1), stde = 0) [1] "The equation that was
> calculated was"
> y = 1 + 2*x1 + 1*x2 + 1*x3
>  + 0*x1*x1 + 0.2*x2*x1 + 0*x3*x1
>  + 0*x1*x2 + 0*x2*x2 + 0*x3*x2
>  + 0*x1*x3 + 0*x2*x3 + 0*x3*x3
>  + N(0,0) random error
> 
> But still, it is not very good.
> 
> As I look at this now, I realize expect just the vech, not the whole vector of all
> interaction terms, so it is even more difficult than I thought to get the correct
> input.Hence, I'd like to let the user write a formula.
> 
> The alternative for the user interface is to have named coefficients.
> I can more or less easily allow a named vector for beta
> 
> beta = c("(Intercept)" = 1, "x1" = 2, "x2" = 1, "x3" = 1, "x2:x1" = 0.1)
> 
> I could build a formula from that.  That's not too bad. But I still think it would
> be cool to allow formula input.
> 
> Have you ever seen it done?
> pj
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
> 
> To write to me directly, please address me at pauljohn at ku.edu.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list