[Rd] [R] Function hints

Duncan Murdoch murdoch at stats.uwo.ca
Mon Jun 19 16:39:23 CEST 2006


I've moved this from R-help to R-devel, where I think it is more 
appropriate, and interspersed comments below.



On 6/19/2006 8:51 AM, hadley wickham wrote:
> One of the recurring themes in the recent UserR conference was that
> many people find it difficult to find the functions they need for a
> particular task.  Sandy Weisberg suggested a small idea he would like
> to see: a hints function that given an object, lists likely
> operations.  I've done my best to implement this function using the
> tools currently available in R, and my code is included at the bottom
> of this email (I hope that I haven't just duplicated something already
> present in R).  I think Sandy's idea is genuinely useful, even in the
> limited form provided by my implementation, and I have already
> discovered a few useful functions that I was unaware of.
> 
> While developing and testing this function, I ran into a few problems
> which, I think, represent underlying problems with the current
> documentation system.  These are typified by the results of running
> hints on a object produced by glm (having class c("glm", "lm")).  I
> have outlined (very tersely) some possible solutions.  Please note
> that while these solutions are largely technological, the problem is
> at heart sociological: writing documentation is no easier (and perhaps
> much harder) than writing a scientific publication, but the rewards
> are fewer.
> 
> Problems:
> 
>  * Many functions share the same description (eg. head, tail).
> Solution: each rdoc file should only describe one method. Problem:
> Writing rdoc files is tedious, there is a lot of information
> duplicated between the code and the documenation (eg. the usage
> statement) and some functions share a lot of similar information.
> Solution: make it easier to write documentation (eg. documentation
> inline with code), and easier to include certain common descriptions
> in multiple methods (eg. new include command)

I think it's bad to document dissimilar functions in the same file, but 
similar related functions *should* be documented together.  Not doing 
this just adds to the burden of documenting them, and the risk of 
modifying only part of the documentation so that it is inconsistent. 
The user also gets the benefit of seeing a common description all at 
once, rather than having to decide whether to follow "See also" links.

Your solutions would both be interesting on their own merits regardless 
of the above.  We did decide to work on preprocessing directives for .Rd 
files at the R core meetings; some sort of include directive may be 
possible.

I don't think I would want complete documentation mixed with the 
original source, but it would certainly be interesting to have partial 
documentation there.  (Complete documentation is too long, and would 
make it harder to read the source without a dedicated editor that could 
hide it.  Though ESS users may see it as a reasonable requirement to 
have everyone use the same editor, I don't think it is.)  However, this 
is a lot of work, depending on infrastructure that is not in place.

>  * It is difficult to tell which functions are commonly
> used/important. Solution: break down by keywords. Problem: keywords
> are not useful at the moment.  Solution:  make better list of keywords
> available and encourage people to use it.  Problem: people won't
> unless there is a strong incentive, plus good keywording requires
> considerable expertise (especially in bulding up list).  This is
> probably insoluable unless one person systematically keywords all of
> the base packages.

I think it is worse than that.  There are concepts in packages that just 
don't arise in base R, and hence there would be no keywords for them 
other than "misc", even if someone redesigned the current system. 
Keywording is hard, and it's not clear to me how to do much better than 
we currently do.

We do already have user-defined keywords (via \concept), but these are 
not widely used.

> 
>  * Some functions aren't documented (eg. simulate.lm, formula.glm) -
> typically, these are methods where the documentation is in the
> generic.  Solution: these methods should all be aliased to the generic
> (by default?), and R CMD check should be amended to check for this
> situation.  You could also argue that this is a deficiency with my
> function, and easily fixed by automatically referring to the generic
> if the specific isn't documented.

I'd say it's a deficiency of your function.  You might want to look at 
the code in get("?") and .helpForCall() to see how those functions work 
out things like

?simulate(x)

where x is an lm object.  (But notice that .helpForCall is an 
undocumented internal function; don't depend on its implementation 
working forever).

>  * It can't supply suggestions when there isn't an explicit method
> (ie. .default is used), this makes it pretty useless for basic
> vectors.  This may not really be a problem, as all possible operations
> are probably too numerous to list.
> 
>  * Provides full name for function, when best practice is to use
> generic part only when calling function.  However, getting precise
> documentation may requires that full name. 

No, not if the call syntax above is used.

  I do the best I can
> (returning the generic if specific is alias to a documentation file
> with the same method name), but this reflects a deeper problem that
> the name you should use when calling a function may be different to
> the name you use to get documentation.
> 
>  * Can only display methods from currently loaded packages.  This is a
> shortcoming of the methods function, but I suspect it is difficult to
> find S3 methods without loading a package.
> 
> Relatively trivial problems:
> 
>  * Needs wide display to be effective.  Could be dealt with by
> breaking description in a sensible manner (there may already by R code
> to do this.  Please let me know if you know of any)

I think strwrap() may do what you want.
> 
>  * Doesn't currently include S4 methods.  Solution: add some more code
> to wrap showMethods
> 
>  * Personally, I think sentence case is more aesthetically pleasing
> (and more flexible) than title case.

It's quite hard to go from existing title case to sentence case, because 
we don't have any markup to indicate proper names.  One would think it 
would be easier to go in the opposite direction, but in fact the same 
problem arises:  "van Beethoven" for example, not "Van Beethoven".


> 
> 
> Hadley
> 
> 
> hints <- function(x) {

I don't like the name "hints".  I think we already have too many ways 
into the help system:

help
?
help.search
apropos
etc.?

I like your function, but I'd rather see it attached to one of the 
existing help functions, probably help.search().  For example,

help.search(x)

could look for functions designed to work with the class of x, if it had 
one.  (There's some ambiguity here:  perhaps x contains a string, and I 
want help on that string.)

Anyway, thanks for your efforts on this so far; I hope we end up with 
something that can make it into the next release.

Duncan Murdoch

> 	db <- eval(utils:::.hsearch_db())
> 	if (is.null(db)) {
> 		help.search("abcd!", rebuild=TRUE, agrep=FALSE)
> 		db <- eval(utils:::.hsearch_db())
> 	}
> 
> 	base <- db$Base
> 	alias <- db$Aliases
> 	key <- db$Keywords
> 
> 	m <- all.methods(class=class(x))
> 	m_id <- alias[match(m, alias[,1]), 2]
> 	keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
> 
> 	f.names <- cbind(m, base[match(m_id, base[,3]), 4])
> 	f.names <- unlist(lapply(1:nrow(f.names), function(i) {
> 		if (is.na(f.names[i, 2])) return(f.names[i, 1])
> 		a <- methodsplit(f.names[i, 1])
> 		b <- methodsplit(f.names[i, 2])
> 		
> 		if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]		
> 	}))
> 	
> 	hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
> 	hints <- hints[order(tolower(hints[,1])),]
> 	hints <- rbind(    c("--------", "---------------"), hints)
> 	rownames(hints) <- rep("", nrow(hints))
> 	colnames(hints) <- c("Function", "Task")
> 	hints[is.na(hints)] <- "(Unknown)"
> 	
> 	class(hints) <- "hints"
> 	hints
> }
> 
> print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
> 
> all.methods <- function(classes) {
> 	methods <- do.call(rbind,lapply(classes, function(x) {
> 		m <- methods(class=x)
> 		t(sapply(as.vector(m), methodsplit)) #m[attr(m, "info")$visible]
> 	}))
> 	rownames(methods[!duplicated(methods[,1]),])
> }
> 
> methodsplit <- function(m) {
> 	parts <- strsplit(m, "\\.")[[1]]
> 	if (length(parts) == 1) {
> 		c(name=m, class="")
> 	} else{
> 		c(name=paste(parts[-length(parts)], collapse="."), class=parts[length(parts)])
> 	}	
> }
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



More information about the R-devel mailing list