[Rd] Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Tue Jun 10 21:10:39 CEST 2025
On 2025-06-10 1:46 p.m., Michael Chirico wrote:
> When it comes to adding more info in the help pages, we'd be remiss
> not to point out the great engineering work the tidyverse folks have
> put in to this end:
>
> https://dplyr.tidyverse.org/reference/mutate.html
>
>> Methods available in currently loaded packages: dbplyr (tbl_lazy), dplyr (data.frame) .
>
> https://github.com/tidyverse/dplyr/blob/9df9d63234f34837fa058bbd081a42dfb8a4eeb0/R/mutate.R#L78-L79
> https://github.com/tidyverse/dplyr/blob/be3e3a05fd0081cb53168d6aedb417d62139b75d/R/doc-methods.R#L57-L75
Those lines arise from this entry in the Rd file:
\Sexpr[stage=render,results=rd]{dplyr:::methods_rd("mutate")}
The dplyr:::methods_rd function basically just formats the results of a
call similar to `methods("mutate")`. All the rest of that is by R Core,
not tidyverse folks.
Duncan Murdoch
>
> On Mon, Jun 9, 2025 at 2:25 PM Lluís Revilla <lluis.revilla using gmail.com> wrote:
>>
>> Hi,
>>
>> I agree that showing that there are other methods might help.
>> The print.function method could be modified to add this in addition to print.default output.
>>
>> But I guess (new) users would check the help page with ?as.data.frame and not print the method or use args() (if they don't check with their prefered LLM/agent).
>> Would it be helpful to also report these numbers on the documentation page of the generic too?
>>
>> Best,
>>
>> Lluís
>>
>>
>> PS: Trying to see what happens with ? and looking for a specific method I found that on my computer with R-devel (2025-06-08 r88288), the expressions below the "Not run:" on ? examples raise errors.
>>
>>
>> On Mon, 9 Jun 2025 at 08:07, Michael Chirico <michaelchirico4 using gmail.com> wrote:
>>>
>>> Thanks Josh,
>>>
>>> With fresh eyes, it's definitely information overload for the
>>> suggested output to take up more space than the function body itself.
>>>
>>> I'm not sure your suggestion gets at the heart of the issue, though,
>>> which is about steering the user with regards to interpreting '...'
>>> they see in the printout.
>>>
>>> Therefore I would suggest something like this as the economized
>>> version of my original suggestion:
>>>
>>>> print(rbind)
>>> function (..., deparse.level = 1)
>>> # ...
>>> <environment: namespace:base>
>>> + 1 other method defines 3 other arguments. See methods(rbind).
>>>
>>>> print(as.data.frame)
>>> function (x, row.names = NULL, optional = FALSE, ...)
>>> # ...
>>> <environment: namespace:base>
>>> + 29 other methods define 11 other arguments. See methods(as.data.frame).
>>>
>>> Mike C
>>>
>>> On Sun, Jun 8, 2025 at 3:57 PM Joshua Ulrich <josh.m.ulrich using gmail.com> wrote:
>>>>
>>>> Hi Mike,
>>>>
>>>> On Fri, Jun 6, 2025 at 1:59 PM Michael Chirico
>>>> <michaelchirico4 using gmail.com> wrote:
>>>>>
>>>>> There is a big difference in how to think of '...' for non-generic
>>>>> functions like data.frame() vs. S3 generics.
>>>>>
>>>>> In the former, it means "any number of inputs" [e.g. columns]; in the
>>>>> latter, it means "any number of inputs [think c()], as well as any
>>>>> arguments that might be interpreted by class implementations".
>>>>>
>>>>> Understanding the difference for a given generic can require carefully
>>>>> reading lots of documentation. print(<generic>), which is useful for
>>>>> so many other contexts, can be a dead end.
>>>>>
>>>>> One idea is to extend the print() method to suggest to the reader
>>>>> which other arguments are available (among registered generics). Often
>>>>> ?<generic> will include the most common implementation, but not always
>>>>> so.
>>>>>
>>>>> For rbind (in a --vanilla session), we currently have one method,
>>>>> rbind.data.frame, that offers three arguments not present in the
>>>>> generic: make.row.names, stringsAsFactors, and factor.exclude. The
>>>>> proposal would be to mention this in the print(rbind) output somehow,
>>>>> e.g.
>>>>>
>>>>>> print(rbind)
>>>>> function (..., deparse.level = 1)
>>>>> .Internal(rbind(deparse.level, ...))
>>>>> <bytecode: 0x73d4fd824e20>
>>>>> <environment: namespace:base>
>>>>>
>>>>> +Other arguments implemented by methods
>>>>> + factor.exclude: rbind.data.frame
>>>>> + make.row.names: rbind.data.frame
>>>>> + stringsAsFactors: rbind.data.frame
>>>>>
>>>>> I suggest grouping by argument, not generic, although something like
>>>>> this could be OK too:
>>>>>
>>>>> +Signatures of other methods
>>>>> + rbind.data.frame(..., deparse.level = 1, make.row.names = TRUE,
>>>>> stringsAsFactors = FALSE,
>>>>> + factor.exclude = TRUE)
>>>>>
>>>>> Where it gets more interesting is when there are many methods, e.g.
>>>>> for as.data.frame (again, in a --vanilla session):
>>>>>
>>>>>> print(as.data.frame)
>>>>> function (x, row.names = NULL, optional = FALSE, ...)
>>>>> {
>>>>> if (is.null(x))
>>>>> return(as.data.frame(list()))
>>>>> UseMethod("as.data.frame")
>>>>> }
>>>>> <bytecode: 0x73d4fc1e70d0>
>>>>> <environment: namespace:base>
>>>>>
>>>>> +Other arguments implemented by methods
>>>>> + base: as.data.frame.table
>>>>> + check.names: as.data.frame.list
>>>>> + col.names: as.data.frame.list
>>>>> + cut.names: as.data.frame.list
>>>>> + fix.empty.names: as.data.frame.list
>>>>> + make.names: as.data.frame.matrix, as.data.frame.model.matrix
>>>>> + new.names: as.data.frame.list
>>>>> + nm: as.data.frame.bibentry, as.data.frame.complex, as.data.frame.Date,
>>>>> + as.data.frame.difftime, as.data.frame.factor, as.data.frame.integer,
>>>>> + as.data.frame.logical, as.data.frame.noquote, as.data.frame.numeric,
>>>>> + as.data.frame.numeric_version, as.data.frame.ordered,
>>>>> + as.data.frame.person, as.data.frame.POSIXct, as.data.frame.raw
>>>>> + responseName: as.data.frame.table
>>>>> + sep: as.data.frame.table
>>>>> + stringsAsFactors: as.data.frame.character, as.data.frame.list,
>>>>> + as.data.frame.matrix, as.data.frame.table
>>>>>
>>>>> Or
>>>>>
>>>>> +Signatures of other methods
>>>>> + as.data.frame.aovproj(x, ...)
>>>>> + as.data.frame.array(x, row.names = NULL, optional = FALSE, ...)
>>>>> + as.data.frame.AsIs(x, row.names = NULL, optional = FALSE, ...)
>>>>> + as.data.frame.bibentry(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.character(x, ..., stringsAsFactors = FALSE)
>>>>> + as.data.frame.citation(x, row.names = NULL, optional = FALSE, ...)
>>>>> + as.data.frame.complex(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.data.frame(x, row.names = NULL, ...)
>>>>> + as.data.frame.Date(x, row.names = NULL, optional = FALSE, ..., nm =
>>>>> deparse1(substitute(x)))
>>>>> + as.data.frame.default(x, ...)
>>>>> + as.data.frame.difftime(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.factor(x, row.names = NULL, optional = FALSE, ..., nm
>>>>> = deparse1(substitute(x)))
>>>>> + as.data.frame.ftable(x, row.names = NULL, optional = FALSE, ...)
>>>>> + as.data.frame.integer(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.list(x, row.names = NULL, optional = FALSE, ...,
>>>>> cut.names = FALSE,
>>>>> + col.names = names(x), fix.empty.names = TRUE, new.names =
>>>>> !missing(col.names),
>>>>> + check.names = !optional, stringsAsFactors = FALSE)
>>>>> + as.data.frame.logical(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.logLik(x, ...)
>>>>> + as.data.frame.matrix(x, row.names = NULL, optional = FALSE,
>>>>> make.names = TRUE,
>>>>> + ..., stringsAsFactors = FALSE)
>>>>> + as.data.frame.model.matrix(x, row.names = NULL, optional = FALSE,
>>>>> make.names = TRUE,
>>>>> + ...)
>>>>> + as.data.frame.noquote(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.numeric(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.numeric_version(x, row.names = NULL, optional =
>>>>> FALSE, ..., nm = deparse1(substitute(x)))
>>>>> + as.data.frame.ordered(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.person(x, row.names = NULL, optional = FALSE, ..., nm
>>>>> = deparse1(substitute(x)))
>>>>> + as.data.frame.POSIXct(x, row.names = NULL, optional = FALSE, ...,
>>>>> nm = deparse1(substitute(x)))
>>>>> + as.data.frame.POSIXlt(x, row.names = NULL, optional = FALSE, ...)
>>>>> + as.data.frame.raw(x, row.names = NULL, optional = FALSE, ..., nm =
>>>>> deparse1(substitute(x)))
>>>>> + as.data.frame.table(x, row.names = NULL, ..., responseName =
>>>>> "Freq", stringsAsFactors = TRUE,
>>>>> + sep = "", base = list(LETTERS))
>>>>> + as.data.frame.ts(x, ...)
>>>>>
>>>>> Obviously that's a bit more cluttered, but as.data.frame() should be a
>>>>> pretty unusual case. It also highlights better the differences in the
>>>>> two approaches: the former economizes on space and focuses on what
>>>>> sorts of arguments are available; the latter shows the defaults, does
>>>>> not hide the arguments shared with the generic, and will always
>>>>> produce as many lines as there are methods.
>>>>>
>>>>> There are other edge cases to think through (multiple registrations,
>>>>> interactions with S4, primitives, ...), but I want to first check with
>>>>> the list if this looks workable & valuable enough to pursue.
>>>>>
>>>> I like and appreciate the intent behind your suggestion, though I
>>>> don't like all the extra output from printing the generic. I want to
>>>> look at the function body when I print it. And as you show, it can
>>>> output a lot of information you're probably not interested in.
>>>>
>>>> What about adding the number of methods to printed output for
>>>> generics, and a suggestion to use `methods(some_generic)` to get a
>>>> list of them? Then you can use help(some_method) or args(some_method)
>>>> to get more information about the specific method(s) you're interested
>>>> in.
>>>>
>>>> Best,
>>>> Josh
>>>>
>>>>> Mike C
>>>>>
>>>>> ----
>>>>>
>>>>> Code that helped with the above:
>>>>>
>>>>> f = as.data.frame
>>>>> # NB: methods() and getAnywhere() require {utils}
>>>>> m = methods(f)
>>>>> generic_args = names(formals(f))
>>>>> f_methods = lapply(m, \(fn) getAnywhere(fn)$objs[[1L]])
>>>>> names(f_methods) = m
>>>>> new_args = sapply(f_methods, \(g) setdiff(names(formals(g)), generic_args))
>>>>> with( # group by argument name
>>>>> data.frame(method = rep(names(new_args), lengths(new_args)), arg =
>>>>> unlist(new_args), row.names=NULL),
>>>>> {tbl = tapply(method, arg, toString); writeLines(paste0(names(tbl),
>>>>> ": ", tbl))}
>>>>> )
>>>>> signatures=sapply(f_methods, \(g) paste(head(format(args(g)), -1),
>>>>> collapse="\n"))
>>>>> writeLines(paste0(names(signatures), gsub("^\\s*function\\s*", "", signatures)))
>>>>>
>>>>> ______________________________________________
>>>>> R-devel using r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>>
>>>> --
>>>> Joshua Ulrich | about.me/joshuaulrich
>>>> FOSS Trading | www.fosstrading.com
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list