[Rd] Bug in the "reformulate" function in stats package

peter dalgaard pd@|gd @end|ng |rom gm@||@com
Fri Apr 19 02:48:56 CEST 2019

I think that, also in R core, it is well recognized that it is unfortunate design that some formula manipulation tools rely on going via textual representation of the entire formula.
I'd be strongly tempted to replace the current reformulate() with something like this

> x <- c("a variable","another variable", "anormalone")
> lhs <- Reduce(function(x,y) bquote(.(x)+.(y)), lapply(x, as.name))
> as.formula(bquote(~.(lhs)))
~`a variable` + `another variable` + anormalone

However, there is a fair amount of conservatism because of the existing code base. 
In particular, one needs to watch out for nasty corner cases: E.g., reformulate(c("x","y","x:y")) contains an interaction term, not a regression variable `x:y`. It is not too clear that this is desirable, but it is quite likely that someone's code actually uses it as a feature. Of course, auto-quoting anything that isn't a plain variable name breaks the feature. And there's no progammatic way to tell whether "P/E" is intended as a variable name (price/earnings ratio) or as equal to "P + P:E", so if we want both possibilities there needs to be a way to choose between them.  Which puts us back at square one.


> On 18 Apr 2019, at 22:21 , Saren Tasciyan <saren.tasciyan using ist.ac.at> wrote:
> So here is it as txt file. It is funny that a R file is restricted in R-devel mailing list.
> Anyhow, in this case R-core have a few choices here:
> * ignore my solution
> * show that it is actually bad or worse
> * consider adding it
> Considering, that it is a minor change from previous version and doesn't bother the existing usage, I saw the necessity to submit it here. But newer solution in the 3.6.0 may solve other problems too. I can't argue against that. This solves my part of the problem, without affecting existing usage of the function.
> If R-core is hard to convince, because this is just who they are, then I should consider moving to other platforms. But so far, it seems to me that they are doing a great job. I don't mind also someone rejecting this tiny fix I have found, which works for me now. I can only thank for their time spent considering it.
> Actually, I had in mind a more complex but cleaner solution with recursive functions to implement any kind of reformulation (not only with "+"). But I simple lack the big picture on R expressions, I need to read more. Maybe I will come back with that in the future.
> Cheers to all,
> Saren
> On 18.04.19 17:51, Ben Bolker wrote:
>>   I appreciate your enthusiasm and persistence for this issue, but I
>> suspect you may have trouble convincing R-core to adopt your changes --
>> they are "better", "easier", "more intuitive" for you ... but how sure
>> are you they are completely backward compatible, have no performance
>> issues, will not break in unusual cases ... ?
> -- 
> Saren Tasciyan
> /PhD Student / Sixt Group/
> Institute of Science and Technology Austria
> Am Campus 1
> 3400 Klosterneuburg, Austria
> <reformulate_solution.txt>______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes using cbs.dk  Priv: PDalgd using gmail.com

More information about the R-devel mailing list