[Rd] Bug in the "reformulate" function in stats package

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Apr 18 22:03:46 CEST 2019


>>>>> Ben Bolker 
>>>>>     on Thu, 18 Apr 2019 11:51:33 -0400 writes:

    > Your file didn't make it through the mailing list (which is quite
    > restrictive about which types/extensions it will take).

    > I appreciate your enthusiasm and persistence for this issue, but I
    > suspect you may have trouble convincing R-core to adopt your changes --
    > they are "better", "easier", "more intuitive" for you ... but how sure
    > are you they are completely backward compatible, have no performance
    > issues, will not break in unusual cases ... ?

    > Hopefully someone here will set up a bugzilla account so you can post
    > your patch/it can be further discussed there, if you want to purseu this ...
This case has been closed quite a while ago, thank you.

The changes will be in R 3.6.0  that'll come in 8 days, not the
least thanks to Ben's patch (earlier in this thread).

Martin Maechler

    > cheers
    > Ben Bolker

    > On 2019-04-18 7:30 a.m., Saren Tasciyan wrote:
    >> Hi,
    >> 
    >> Sorry for writing this late, I was very busy. I started this discussion
    >> here. I wish I could write to bugs.r-project.org, but I don't have an
    >> account and I will write here instead.
    >> 
    >> Meanwhile, I solved my problem with a simpler fix (please see attached
    >> file)/.
    >> /
    >> 
    >> This requires that term labels are not "ticked". I think this is better,
    >> since it is easier to have column names unticked.
    >> 
    >> New development function is IMO unnecessarily complicated. It requires
    >> strings to be ticked or as.name(). It is more intuitive to have a vector
    >> of column names.
    >> 
    >> Best,
    >> 
    >> Saren
    >> 
    >> 
    >> On 05.04.19 09:38, Martin Maechler wrote:
    >>>>>>>> Ben Bolker
    >>>>>>>>      on Thu, 4 Apr 2019 12:46:37 -0400 writes:
    >>>    > Proposed patch
    >>> 
    >>> Thank you Ben!
    >>> 
    >>> 
    >>> [the rest is technical nit-picking .. but hopefully interesting
    >>>   to the smart R-devel reader base:]
    >>> 
    >>> There was a very subtle thinko in your patch which is not easily
    >>> diagnosed from R's parse_Rd():
    >>> 
    >>> Error in
    >>> parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd", 
    >>> :
    >>>    Unexpected end of input (in " quoted string opened at
    >>> delete.response.Rd:78:63)
    >>> In addition: Warning message:
    >>> In
    >>> parse_Rd("/u/maechler/R/D/r-devel/R/src/library/stats/man/delete.response.Rd", 
    >>> :
    >>>    newline within quoted string at delete.response.Rd:74
    >>> 
    >>> and even I needed more than a minute to find out that the
    >>> culprit was that
    >>> 
    >>>    reformulate(sprintf("`%s`", x))
    >>> 
    >>> is not ok in *.Rd  and must be
    >>> 
    >>>    reformulate(sprintf("`\%s`", x))
    >>> 
    >>> ---------
    >>> 
    >>>    > (I think .txt files work OK as attachments to the list?)
    >>> 
    >>> yes, typically -- what really counts is if your e-mail program
    >>> marks them with MIME-type 'text/plain'
    >>> and most E-mail programs are very "silly" / "safe" nowadays and
    >>> don't expect to have smart users  and hence mark (and sometimes
    >>> encode) everything unknown as non-text.
    >>> 
    >>> Using very old flexible e-mail interfaces such as Emacs VM allow
    >>> you to specify the MIME-type in addition to the file *and* it
    >>> also proposes smart defaults, I think by using something like
    >>> unix 'file' to determine that your 'foo.diff' file is plain text.
    >>> {{ .. and we all know that Windows is sillily using file extensions
    >>>     to determine file type and only knows  Windows-extensions plus
    >>>     those added explicitly by software installed; so nowadays *.rda
    >>>     is marked as an Rstudio file ... [argh].
    >>> }}
    >>> 
    >>> Martin
    >>> 
    >>>      > On 2019-04-04 2:21 a.m., Martin Maechler wrote:
    >>>      >>>>>>> Ben Bolker
    >>>      >>>>>>> on Fri, 29 Mar 2019 12:34:50 -0400 writes:
    >>>      >>
    >>>      >> > I suspect that the issue is addressed (obliquely) in the
    >>> examples,
    >>>      >> > which shows that variables with spaces in them (or otherwise
    >>>      >> > 'non-syntactic', i.e. not satisfying the constraints of
    >>> legal R symbols)
    >>>      >> > can be handled by protecting them with backticks  (``)
    >>>      >>
    >>>      >> > ## using non-syntactic names:
    >>>      >> > reformulate(c("`P/E`", "`% Growth`"), response = as.name("+-"))
    >>>      >>
    >>>      >> > It seems to me there could be room for a *documentation*
    >>> patch (stating
    >>>      >> > explicitly that if termlabels has length > 1 its elements are
    >>>      >> > concatenated with "+", and explicitly stating that
    >>> non-syntactic names
    >>>      >> > must be protected with back-ticks).  (There is a little bit
    >>> of obscurity
    >>>      >> > in the fact that the elements of termlabels don't have to be
    >>>      >> > syntactically valid names: many will be included in formulas
    >>> if they can
    >>>      >> > be interpreted as *parseable* expressions, e.g.
    >>> reformulate("x<2"))
    >>>      >>
    >>>      >> > I would be happy to give it a shot if the consensus is that
    >>> it would
    >>>      >> > be worthwhile.
    >>>      >>
    >>>      >> I think it would be worthwhile to add to the docs a bit.
    >>>      >>
    >>>      >> [With currently just your and my vote, we have a 100% consensus
    >>>      >> ;-)]
    >>>      >>
    >>>      >> Martin
    >>>      >>
    >>>      >> > One workaround to the OP's problem is below (may be worth
    >>> including
    >>>      >> > as an example in docs)
    >>>      >>
    >>>      >> >> z <- c("a variable","another variable")
    >>>      >> >> reformulate(z)
    >>>      >> > Error in parse(text = termtext, keep.source = FALSE) :
    >>>      >> > <text>:1:6: unexpected symbol
    >>>      >> > 1:  ~ a variable
    >>>      >> > ^
    >>>      >> >> reformulate(sprintf("`%s`",z))
    >>>      >> > ~`a variable` + `another variable`
    >>>      >>
    >>>      >>
    >>>      >>
    >>>      >>
    >>>      >> > On 2019-03-29 11:54 a.m., J C Nash wrote:
    >>>      >> >> The main thing is to post the "small reproducible example".
    >>>      >> >>
    >>>      >> >> My (rather long term experience) can be written
    >>>      >> >>
    >>>      >> >> if (exists("reproducible example") ) {
    >>>      >> >> DeveloperFixHappens()
    >>>      >> >> } else {
    >>>      >> >> NULL
    >>>      >> >> }
    >>>      >> >>
    >>>      >> >> JN
    >>>      >> >>
    >>>      >> >> On 2019-03-29 11:38 a.m., Saren Tasciyan wrote:
    >>>      >> >>> Well, first I can't sign in bugzilla myself, that is why I
    >>> wrote here first. Also, I don't know if I have the time at
    >>>      >> >>> the moment to provide tests, multiple examples or more. If
    >>> that is not ok or welcomed, that is fine, I can come back,
    >>>      >> >>> whenever I have more time to properly report the bug.
    >>>      >> >>>
    >>>      >> >>> I didn't find the existing bug report, sorry for that.
    >>>      >> >>>
    >>>      >> >>> Yes, it is related. My problem was that I have column
    >>> names with spaces and current solution doesn't solve it. I have a
    >>>      >> >>> solution, which works for me and maybe also for others.
    >>>      >> >>>
    >>>      >> >>> Either, someone can register me to bugzilla or I can post
    >>> it here, which could give some direction to developers. I
    >>>      >> >>> don't mind whichever is preferred here.
    >>>      >> >>>
    >>>      >> >>> Best,
    >>>      >> >>>
    >>>      >> >>> Saren
    >>>      >> >>>
    >>>      >> >>>
    >>>      >> >>> On 29.03.19 09:29, Martin Maechler wrote:
    >>>      >> >>>>>>>>> Saren Tasciyan
    >>>      >> >>>>>>>>>      on Thu, 28 Mar 2019 17:02:10 +0100 writes:
    >>>      >> >>>>      > Hi,
    >>>      >> >>>>      > I have found a bug in reformulate function and
    >>> have a solution for it. I
    >>>      >> >>>>      > was wondering, where I can submit it?
    >>>      >> >>>>
    >>>      >> >>>>      > Best,
    >>>      >> >>>>      > Saren
    >>>      >> >>>>
    >>>      >> >>>>
    >>>      >> >>>> Well, you could have given a small reproducible example
    >>>      >> >>>> depicting the bug, notably when posting here:
    >>>      >> >>>> Just a prose text with no R code or other technical
    >>> content is
    >>>      >> >>>> almost always not really appropriate fo the R-devel
    >>> mailing list.
    >>>      >> >>>>
    >>>      >> >>>> Further, in such a case you should google a bit and
    >>> hopefully
    >>>      >> >>>> have found
    >>>      >> >>>>         https://www.r-project.org/bugs.html
    >>>      >> >>>>
    >>>      >> >>>> which also mention reproducibility (and many more useful
    >>> things).
    >>>      >> >>>>
    >>>      >> >>>> Then it also tells you about R's bug repository, also called
    >>>      >> >>>> "R's bugzilla" at https://bugs.r-project.org/
    >>>      >> >>>>
    >>>      >> >>>> and if you are diligent (but here, I'd say bugzilla is
    >>>      >> >>>> (configured?) far from ideal), you'd also find bug PR#17359
    >>>      >> >>>>
    >>>      >> >>>>    
    >>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17359
    >>>      >> >>>>
    >>>      >> >>>> which was reported already on Nov 2017 .. and only fixed
    >>>      >> >>>> yesterday (in the "cleanup old bugs" process that happens
    >>>      >> >>>> often before the big new spring release of R).
    >>>      >> >>>>
    >>>      >> >>>> So is your bug the same as that one?
    >>>      >> >>>>
    >>>      >> >>>> Martin
    >>>      >> >>>>
    >>>      >> >>>>      > --
    >>>      >> >>>>      > Saren Tasciyan
    >>>      >> >>>>      > /PhD Student / Sixt Group/
    >>>      >> >>>>      > Institute of Science and Technology Austria
    >>>      >> >>>>      > Am Campus 1
    >>>      >> >>>>      > 3400 Klosterneuburg, Austria
    >>>      >> >>>>
    >>>      >> >>>>      > ______________________________________________
    >>>      >> >>>>      > R-devel using r-project.org mailing list
    >>>      >> >>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>      >> >>>>
    >>>      >> >>>> ______________________________________________
    >>>      >> >>>> R-devel using r-project.org mailing list
    >>>      >> >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>      >> >>
    >>>      >> >> ______________________________________________
    >>>      >> >> R-devel using r-project.org mailing list
    >>>      >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>      >> >>
    >>>      >>
    >>>      >> > ______________________________________________
    >>>      >> > R-devel using r-project.org mailing list
    >>>      >> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>>      >>
    >>>      > x[DELETED ATTACHMENT external: reformulate.diff, plain text]
    >>>      > ______________________________________________
    >>>      > R-devel using r-project.org mailing list
    >>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
    >>> 
    >>> ______________________________________________
    >>> R-devel using r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>> 
    >>> ______________________________________________
    >>> R-devel using r-project.org mailing list
    >>> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list