[R] Function hints

Joerg van den Hoff j.van_den_hoff at fz-rossendorf.de
Mon Jun 19 18:14:10 CEST 2006


hadley wickham wrote:
> One of the recurring themes in the recent UserR conference was that
> many people find it difficult to find the functions they need for a
> particular task.  Sandy Weisberg suggested a small idea he would like
> to see: a hints function that given an object, lists likely
> operations.  I've done my best to implement this function using the
> tools currently available in R, and my code is included at the bottom
> of this email (I hope that I haven't just duplicated something already
> present in R).  I think Sandy's idea is genuinely useful, even in the
> limited form provided by my implementation, and I have already
> discovered a few useful functions that I was unaware of.
> 
> While developing and testing this function, I ran into a few problems
> which, I think, represent underlying problems with the current
> documentation system.  These are typified by the results of running
> hints on a object produced by glm (having class c("glm", "lm")).  I
> have outlined (very tersely) some possible solutions.  Please note
> that while these solutions are largely technological, the problem is
> at heart sociological: writing documentation is no easier (and perhaps
> much harder) than writing a scientific publication, but the rewards
> are fewer.
> 
> Problems:
> 
>  * Many functions share the same description (eg. head, tail).
> Solution: each rdoc file should only describe one method. Problem:
> Writing rdoc files is tedious, there is a lot of information
> duplicated between the code and the documenation (eg. the usage
> statement) and some functions share a lot of similar information.
> Solution: make it easier to write documentation (eg. documentation
> inline with code), and easier to include certain common descriptions
> in multiple methods (eg. new include command)
> 
>  * It is difficult to tell which functions are commonly
> used/important. Solution: break down by keywords. Problem: keywords
> are not useful at the moment.  Solution:  make better list of keywords
> available and encourage people to use it.  Problem: people won't
> unless there is a strong incentive, plus good keywording requires
> considerable expertise (especially in bulding up list).  This is
> probably insoluable unless one person systematically keywords all of
> the base packages.
> 
>  * Some functions aren't documented (eg. simulate.lm, formula.glm) -
> typically, these are methods where the documentation is in the
> generic.  Solution: these methods should all be aliased to the generic
> (by default?), and R CMD check should be amended to check for this
> situation.  You could also argue that this is a deficiency with my
> function, and easily fixed by automatically referring to the generic
> if the specific isn't documented.
> 
>  * It can't supply suggestions when there isn't an explicit method
> (ie. .default is used), this makes it pretty useless for basic
> vectors.  This may not really be a problem, as all possible operations
> are probably too numerous to list.
> 
>  * Provides full name for function, when best practice is to use
> generic part only when calling function.  However, getting precise
> documentation may requires that full name.  I do the best I can
> (returning the generic if specific is alias to a documentation file
> with the same method name), but this reflects a deeper problem that
> the name you should use when calling a function may be different to
> the name you use to get documentation.
> 
>  * Can only display methods from currently loaded packages.  This is a
> shortcoming of the methods function, but I suspect it is difficult to
> find S3 methods without loading a package.
> 
> Relatively trivial problems:
> 
>  * Needs wide display to be effective.  Could be dealt with by
> breaking description in a sensible manner (there may already by R code
> to do this.  Please let me know if you know of any)
> 
>  * Doesn't currently include S4 methods.  Solution: add some more code
> to wrap showMethods
> 
>  * Personally, I think sentence case is more aesthetically pleasing
> (and more flexible) than title case.
> 
> 
> Hadley
> 
> 
> hints <- function(x) {
> 	db <- eval(utils:::.hsearch_db())
> 	if (is.null(db)) {
> 		help.search("abcd!", rebuild=TRUE, agrep=FALSE)
> 		db <- eval(utils:::.hsearch_db())
> 	}
> 
> 	base <- db$Base
> 	alias <- db$Aliases
> 	key <- db$Keywords
> 
> 	m <- all.methods(class=class(x))
> 	m_id <- alias[match(m, alias[,1]), 2]
> 	keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
> 
> 	f.names <- cbind(m, base[match(m_id, base[,3]), 4])
> 	f.names <- unlist(lapply(1:nrow(f.names), function(i) {
> 		if (is.na(f.names[i, 2])) return(f.names[i, 1])
> 		a <- methodsplit(f.names[i, 1])
> 		b <- methodsplit(f.names[i, 2])
> 		
> 		if (a[1] == b[1]) f.names[i, 2] else f.names[i, 1]		
> 	}))
> 	
> 	hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
> 	hints <- hints[order(tolower(hints[,1])),]
> 	hints <- rbind(    c("--------", "---------------"), hints)
> 	rownames(hints) <- rep("", nrow(hints))
> 	colnames(hints) <- c("Function", "Task")
> 	hints[is.na(hints)] <- "(Unknown)"
> 	
> 	class(hints) <- "hints"
> 	hints
> }
> 
> print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
> 
> all.methods <- function(classes) {
> 	methods <- do.call(rbind,lapply(classes, function(x) {
> 		m <- methods(class=x)
> 		t(sapply(as.vector(m), methodsplit)) #m[attr(m, "info")$visible]
> 	}))
> 	rownames(methods[!duplicated(methods[,1]),])
> }
> 
> methodsplit <- function(m) {
> 	parts <- strsplit(m, "\\.")[[1]]
> 	if (length(parts) == 1) {
> 		c(name=m, class="")
> 	} else{
> 		c(name=paste(parts[-length(parts)], collapse="."), class=parts[length(parts)])
> 	}	
> }
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


just a feedback: that's a useful function, thank you.

but the problem is probably more general: frequently I do not really 
want to know what I generally can do with a data frame, for instance, 
but rather I would like to use `help.search' as I would use, say, Google 
(and with the same rate of success...).
but the actual `keywords' in the manpages seem insufficient and 
`help.search' does not allow full text search in the manpages (I can 
imagine why (1000 hits...), but without such a thing google, for 
instance, would probably not be half as useful as it is, right?) and 
there is no "sorting by relevance" in the `help.search' output, I think. 
how this sorting could be achieved is a different question, of course.



More information about the R-help mailing list