[Rd] conflicted: an alternative conflict resolution strategy

Hadley Wickham h@wickh@m @ending from gm@il@com
Wed Aug 29 23:41:46 CEST 2018


>> conflicted applies a few heuristics to minimise false positives (at the
>> cost of introducing a few false negatives). The overarching goal is to
>> ensure that code behaves identically regardless of the order in which
>> packages are attached.
>>
>> -   A number of packages provide a function that appears to conflict
>>     with a function in a base package, but they follow the superset
>>     principle (i.e. they only extend the API, as explained to me by
>>     Hervè Pages).
>>
>>     conflicted assumes that packages adhere to the superset principle,
>>     which appears to be true in most of the cases that I’ve seen.
>
>
> It seems that you may be able to strengthen this heuristic from a blanket assumption to something more narrowly targeted by looking for one or more of the following to confirm likely-superset adherence
>
> matching or purely extending formals (ie all the named arguments of base::fun match including order, and there are new arguments in pkg::fun only if base::fun takes ...)
> explicit call to  base::fun in the body of pkg::fun
> UseMethod(funname) and at least one provided S3 method calls base::fun
> S4 generic creation using fun or base::fun as the seeding/default method body or called from at least one method

Oooh nice, idea I'll definitely try it out.

>> For
>>     example, the lubridate package provides `as.difftime()` and `date()`
>>     which extend the behaviour of base functions, and provides S4
>>     generics for the set operators.
>>
>>         conflict_scout(c("lubridate", "base"))
>>         #> 5 conflicts:
>>         #> * `as.difftime`: [lubridate]
>>         #> * `date`       : [lubridate]
>>         #> * `intersect`  : [lubridate]
>>         #> * `setdiff`    : [lubridate]
>>         #> * `union`      : [lubridate]
>>
>>     There are two popular functions that don’t adhere to this principle:
>>     `dplyr::filter()` and `dplyr::lag()` :(. conflicted handles these
>>     special cases so they correctly generate conflicts. (I sure wish I’d
>>     know about the subset principle when creating dplyr!)
>>
>>         conflict_scout(c("dplyr", "stats"))
>>         #> 2 conflicts:
>>         #> * `filter`: dplyr, stats
>>         #> * `lag`   : dplyr, stats
>>
>> -   Deprecated functions should never win a conflict, so conflicted
>>     checks for use of `.Deprecated()`. This rule is very useful when
>>     moving functions from one package to another. For example, many
>>     devtools functions were moved to usethis, and conflicted ensures
>>     that you always get the non-deprecated version, regardess of package
>>     attach order:
>
>
> I would completely believe this rule is useful for refactoring as you describe, but that is the "same function" case. For an end-user in the "different function same symbol" case it's not at all clear to me that the deprecated function should always win.
>
> People sometimes use deprecated functions. It's not great, and eventually they'll need to fix that for any given case, but imagine if you deprecated the filter verb in dplyr (I know this will never happen, but I think it's illustrative none the less).
>
> Consider a piece of code someone wrote before this hypothetical deprecation of filter. The fact that it's now deprecated certainly doesn't mean that they secretly wanted stats::filter all along, right? Conflicted acting as if it does will lead to them getting the exact kind of error you're looking to protect them from, and with even less ability to understand why because they are already doing "The right thing" to protect themselves by using conflicted in the first place...

Ah yes, good point. I'll add some heuristic to check that the function
name appears in the first argument of the .Deprecated call (assuming
that the call looks something like `.Deprecated("pkg::foo")`)

>> Finally, as mentioned above, the user can declare preferences:
>>
>>     conflict_prefer("select", "MASS")
>>     #> [conflicted] Will prefer MASS::select over any other package
>>     conflict_scout(c("dplyr", "MASS"))
>>     #> 1 conflict:
>>     #> * `select`: [MASS]
>>
>
> I deeply worry about people putting this kind of thing, or even just library(conflicted), in their .Rprofile and thus making their scripts substantially less reproducible. Is that a consequence you have thought about to this kind of functionality?

Yes, and I've already recommended against it in two places :)  I'm not
sure if there's any more I can do - people already put (e.g.)
`library(ggplot2)` in their .Rprofile, which is just as bad from a
reproducibility standpoint.

Thanks for the thoughtful feedback!

Hadley

-- 
http://hadley.nz



More information about the R-devel mailing list