[Rd] Fwd: Re: [EXTERNAL] Re: backquotes and term.labels

Ben Bolker bbolker at gmail.com
Thu Mar 8 17:11:28 CET 2018



On 18-03-08 10:07 AM, Martin Maechler wrote:
>>>>>> Ben Bolker <bbolker at gmail.com>
>>>>>>     on Thu, 8 Mar 2018 09:42:40 -0500 writes:
> 
>     > Meant to respond to this but forgot.
>     > I didn't write a new terms() function  -- I added an attribute to the
>     > terms() (a vector of the names
>     > of the constructed model matrix), thus preserving the information at
>     > the point when it was available.
>     > I do agree that it would be preferable to have an upstream fix ...
> 
> did anybody ever propose a small patch to the upstream sources ?
> 
> -- including a REPREX (or 2: one for lme4, one for survival) 
> 
> I'm open to look at one .. not for the next few days, though.
> 
> Martin

  Didn't get around to it ...  a bit worried about doing it 100%
back-compatibly, also a bit scared in general of messing with such deep
stuff. It could probably done *non*-backward-compatibly simply by
changing line 459 of library/stats/R/models.R to

    varnames <- vapply(vars, function(x) deparse2(x,backtick=FALSE),
        " ")[-1L]

... ?


> 
> 
>     > On Thu, Mar 8, 2018 at 9:39 AM, Therneau, Terry M., Ph.D. via R-devel
>     > <r-devel at r-project.org> wrote:
>     >> Ben,
>     >> 
>     >> 
>     >> Looking at your notes, it appears that your solution is to write your own
>     >> terms() function
>     >> for lme.  It is easy to verify that the "varnames.fixed" attribute is not
>     >> returned by the
>     >> ususal terms function.
>     >> 
>     >> Then I also need to write my own terms function for the survival and coxme
>     >> pacakges?
>     >> Because of the need to treat strata() terms in a special way I manipulate
>     >> the
>     >> formula/terms in nearly every routine.
>     >> 
>     >> Extrapolating: every R package that tries to examine formulas and partition
>     >> them into bits
>     >> needs its own terms function?  This does not look like a good solution to
>     >> me.
>     >> 
>     >> On 03/07/2018 07:39 AM, Ben Bolker wrote:
>     >>> 
>     >>> I knew I had seen this before but couldn't previously remember where.
>     >>> https://github.com/lme4/lme4/issues/441 ... I initially fixed with
>     >>> gsub(), but (pushed by Martin Maechler to do better) I eventually
>     >>> fixed it by storing the original names of the model frame (without
>     >>> backticks) as an attribute for later retrieval:
>     >>> 
>     >>> https://github.com/lme4/lme4/commit/56416fc8b3b5153df7df5547082835c5d5725e89.
>     >>> 
>     >>> 
>     >>> On Wed, Mar 7, 2018 at 8:22 AM, Therneau, Terry M., Ph.D. via R-devel
>     >>> <r-devel at r-project.org> wrote:
>     >>>> 
>     >>>> Thanks to Bill Dunlap for the clarification.  On follow-up it turns out
>     >>>> that
>     >>>> this will be an issue for many if not most of the routines in the
>     >>>> survival
>     >>>> package: a lot of them look at the terms structure and make use of the
>     >>>> dimnames of attr(terms, 'factors'), which also keeps the unneeded
>     >>>> backquotes.  Others use the term.labels attribute.  To dodge this I will
>     >>>> need to create a fixterms() routine which I call at the top of every
>     >>>> single
>     >>>> routine in the library.
>     >>>> 
>     >>>> Is there a chance for a fix at a higher level?
>     >>>> 
>     >>>> Terry T.
>     >>>> 
>     >>>> 
>     >>>> 
>     >>>> On 03/05/2018 03:55 PM, William Dunlap wrote:
>     >>>>> 
>     >>>>> I believe this has to do terms() making "term.labels" (hence the
>     >>>>> dimnames
>     >>>>> of "factors")
>     >>>>> with deparse(), so that the backquotes are included for non-syntactic
>     >>>>> names.  The backquotes
>     >>>>> are not in the column names of the input data.frame (nor model frame) so
>     >>>>> you get a mismatch
>     >>>>> when subscripting the data.frame or model.frame with elements of
>     >>>>> terms()$term.labels.
>     >>>>> 
>     >>>>> I think you can avoid the problem by adding right after
>     >>>>> ll <- attr(Terms, "term.labels")
>     >>>>> the line
>     >>>>> ll <- gsub("^`|`$", "", ll)
>     >>>>> 
>     >>>>> E.g.,
>     >>>>> 
>     >>>>> > d <- data.frame(check.names=FALSE, y=1/(1:5), `b$a$d`=sin(1:5)+2, `x
>     >>>>> y
>     >>>>> z`=cos(1:5)+2)
>     >>>>> > Terms <- terms( y ~ log(`b$a$d`) + `x y z` )
>     >>>>> > m <- model.frame(Terms, data=d)
>     >>>>> > colnames(m)
>     >>>>> [1] "y"            "log(`b$a$d`)" "x y z"
>     >>>>> > attr(Terms, "term.labels")
>     >>>>> [1] "log(`b$a$d`)" "`x y z`"
>     >>>>> >   ll <- attr(Terms, "term.labels")
>     >>>>> > gsub("^`|`$", "", ll)
>     >>>>> [1] "log(`b$a$d`)" "x y z"
>     >>>>> 
>     >>>>> It is a bit of a mess.
>     >>>>> 
>     >>>>> 
>     >>>>> Bill Dunlap
>     >>>>> TIBCO Software
>     >>>>> wdunlap tibco.com <http://tibco.com>
>     >>>>> 
>     >>>>> On Mon, Mar 5, 2018 at 12:55 PM, Therneau, Terry M., Ph.D. via R-devel
>     >>>>> <r-devel at r-project.org <mailto:r-devel at r-project.org>> wrote:
>     >>>>> 
>     >>>>> A user reported a problem with the survdiff function and the use of
>     >>>>> variables that
>     >>>>> contain a space.  Here is a simple example.  The same issue occurs
>     >>>>> in
>     >>>>> survfit for the
>     >>>>> same reason.
>     >>>>> 
>     >>>>> lung2 <- lung
>     >>>>> names(lung2)[1] <- "in st"   # old name is inst
>     >>>>> survdiff(Surv(time, status) ~ `in st`, data=lung2)
>     >>>>> Error in `[.data.frame`(m, ll) : undefined columns selected
>     >>>>> 
>     >>>>> In the body of the code the program want to send all of the
>     >>>>> right-hand
>     >>>>> side variables
>     >>>>> forward to the strata() function.  The code looks more or less like
>     >>>>> this, where m is
>     >>>>> the model frame
>     >>>>> 
>     >>>>> Terms <- terms(m)
>     >>>>> index <- attr(Terms, "term.labels")
>     >>>>> if (length(index) ==0)  X <- rep(1L, n)  # no coariates
>     >>>>> else X <- strata(m[index])
>     >>>>> 
>     >>>>> For the variable with a space in the name the term.label is "`in
>     >>>>> st`",
>     >>>>> and the
>     >>>>> subscript fails.
>     >>>>> 
>     >>>>> Is this intended behaviour or a bug?  The issue is that the name of
>     >>>>> this column in the
>     >>>>> model frame does not have the backtics, while the terms structure
>     >>>>> does
>     >>>>> have them.
>     >>>>> 
>     >>>>> Terry T.
>     >>>>> 
>     >>>>> ______________________________________________
>     >>>>> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
>     >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>>>> <https://stat.ethz.ch/mailman/listinfo/r-devel>
>     >>>>> 
>     >>>>> 
>     >>>> ______________________________________________
>     >>>> R-devel at r-project.org mailing list
>     >>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >> 
>     >> 
>     >> ______________________________________________
>     >> R-devel at r-project.org mailing list
>     >> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>     > ______________________________________________
>     > R-devel at r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list