[Rd] Documentation issues [Was: Function hints]

Tue Jun 20 11:18:14 CEST 2006

I would like to follow up on another one of the documentation issues raised in the discussion on function hints. Duncan mentioned that the R core were working on preprocessing directives for .Rd files, which could possibly include some sort of include directive. I was wondering if a "includeexamples" directive might also be considered.

It often makes sense to use the same example to illustrate the use of different functions, or perhaps extend an example used to illustrate one function to illustrate another. One way to do this is simply to put

example(fnA)

in the \examples for fnB, but this is not particularly helpful for people reading the help pages as they either need to look at both help pages or run the example. The alternative is to maintain multiple copies of the same code, which is not ideal.

So it would be useful to be able to put

\includeexamples(fnA)

so that the code is replicated in fnB.Rd. Perhaps an include directive could do this anyway, but it might be useful to have a special directive for examples so that RCMD check is set up to only check the original example to save time (and unnecessary effort).

On a related issue, it would be nice if source() had an option to print comments contained in the source file, so that example() and demo() could print out annotation.

Heather

Dr H Turner
Research Assistant
Dept. of Statistics
The University of Warwick
Coventry
CV4 7AL

Tel: 024 76575870
Fax: 024 7652 4532
Url: www.warwick.ac.uk/go/heatherturner

>>> <Mark.Bravington at csiro.au> 06/20/06 01:43am >>>
[This is not about the feasibility of a "hints" function-- which would
be incredibly useful, but perhaps very very hard to do-- but about some
of the other documentation issues raised in Hadley's post and in
Duncan's reply]

WRTO documentation & code together: for several years, I've successfully
used the 'mvbutils' package to keep every function definition & its
documentation together, editing them together in the same file--
function first, then documentation in plain-text (basically the format
you see if you use "vanilla help" inside R). Storage-wise, the
documentation is just kept as an attribute of the function (with a print
method that hides it by default)-- I also keep a text backup of the
combination. Any text editor will do. When it's time to create a
package, the Rd file is generated automatically.

For me, it's been extremely helpful to keep function & documentation
together during editing-- it greatly increases the chance that I will
actually update the doco when I change the code, rather than putting it
off until I've forgotten what I did. Also, writing Rd format is a
nightmare (again, personal opinion)-- being able to write plain-text
makes the whole documentation thing bearable.

The above is not quite to the point of the original post, I think, which
talks about storing the documentation as commented bits *inside* the
function code. However, I'm not sure the latter is really desirable;
there is some merit in forcing authors to write an explicit "Details" or
"Description" section that is not just a paraphrase of programming
comments, and such sections are unlikely to fit easily inside code. At
any rate, I wouldn't want to have to interpret my *own* programming
comments as a usage guide!

WRTO automatic "usage" sections: it is easy to write code to do this
('prompt', and there is also some in 'mvbutils'-- not sure if it's in
the current release though) but at least as far as the "usage" section
goes, I think people should be "vigorously encouraged" to write their
own, showing as far as possible how one might actually *use* the
function. For many functions, just duplicating the argument list is not
helpful to the user-- a function can often be invoked in several
different ways, with different arguments relevant to different
invocations. I think it's good to show how this can be done in the
"usage" section, with comments, rather than deferring all practical
usage to "examples". For one thing, "usage" is near the top, and so
gives a very quick reminder without having to scroll through the entire
doco; for another, "usage" and "arguments" are visually adjacent,
whereas "examples" can be widely separated from "arguments".

My general point here is: the documentating process should be as
painless as possible, but not more so. Defaults that are likely to lead
to unhelpful documentation are perhaps best avoided.
For this general reason, I applaud R's fairly rigid documentation
standards, even though I frequently curse them. (And I would like to see
some bits more rigid, such as compulsory "how-to-use-this" documentation
for each package!)

The next version of 'mvbutils' will include various tools for easy "live
editing" and automated preparation of packages-- I've been using them
for a while, but still have to get round to finishing the documentation
;) 

Mark Bravington
CSIRO Mathematical & Information Sciences
Marine Laboratory
Castray Esplanade
Hobart 7001
TAS

ph (+61) 3 6232 5118
fax (+61) 3 6232 5012
mob (+61) 438 315 623

> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
> Sent: Tuesday, 20 June 2006 12:39 AM
> To: hadley wickham; R-devel
> Subject: Re: [Rd] [R] Function hints
> 
> I've moved this from R-help to R-devel, where I think it is 
> more appropriate, and interspersed comments below.
> 
> 
> 
> On 6/19/2006 8:51 AM, hadley wickham wrote:
> > One of the recurring themes in the recent UserR conference was that
> > many people find it difficult to find the functions they need for a
> > particular task.  Sandy Weisberg suggested a small idea he 
> would like
> > to see: a hints function that given an object, lists likely
> > operations.  I've done my best to implement this function using the
> > tools currently available in R, and my code is included at 
> the bottom
> > of this email (I hope that I haven't just duplicated 
> something already
> > present in R).  I think Sandy's idea is genuinely useful, 
> even in the
> > limited form provided by my implementation, and I have already
> > discovered a few useful functions that I was unaware of.
> > 
> > While developing and testing this function, I ran into a 
> few problems
> > which, I think, represent underlying problems with the current
> > documentation system.  These are typified by the results of running
> > hints on a object produced by glm (having class c("glm", "lm")).  I
> > have outlined (very tersely) some possible solutions.  Please note
> > that while these solutions are largely technological, the problem is
> > at heart sociological: writing documentation is no easier 
> (and perhaps
> > much harder) than writing a scientific publication, but the rewards
> > are fewer.
> > 
> > Problems:
> > 
> >  * Many functions share the same description (eg. head, tail).
> > Solution: each rdoc file should only describe one method. Problem:
> > Writing rdoc files is tedious, there is a lot of information
> > duplicated between the code and the documenation (eg. the usage
> > statement) and some functions share a lot of similar information.
> > Solution: make it easier to write documentation (eg. documentation
> > inline with code), and easier to include certain common descriptions
> > in multiple methods (eg. new include command)
> 
> I think it's bad to document dissimilar functions in the same 
> file, but 
> similar related functions *should* be documented together.  Not doing 
> this just adds to the burden of documenting them, and the risk of 
> modifying only part of the documentation so that it is inconsistent. 
> The user also gets the benefit of seeing a common description all at 
> once, rather than having to decide whether to follow "See also" links.
> 
> Your solutions would both be interesting on their own merits 
> regardless 
> of the above.  We did decide to work on preprocessing 
> directives for .Rd 
> files at the R core meetings; some sort of include directive may be 
> possible.
> 
> I don't think I would want complete documentation mixed with the 
> original source, but it would certainly be interesting to 
> have partial 
> documentation there.  (Complete documentation is too long, and would 
> make it harder to read the source without a dedicated editor 
> that could 
> hide it.  Though ESS users may see it as a reasonable requirement to 
> have everyone use the same editor, I don't think it is.)  
> However, this 
> is a lot of work, depending on infrastructure that is not in place.
> 
> >  * It is difficult to tell which functions are commonly
> > used/important. Solution: break down by keywords. Problem: keywords
> > are not useful at the moment.  Solution:  make better list 
> of keywords
> > available and encourage people to use it.  Problem: people won't
> > unless there is a strong incentive, plus good keywording requires
> > considerable expertise (especially in bulding up list).  This is
> > probably insoluable unless one person systematically keywords all of
> > the base packages.
> 
> I think it is worse than that.  There are concepts in 
> packages that just 
> don't arise in base R, and hence there would be no keywords for them 
> other than "misc", even if someone redesigned the current system. 
> Keywording is hard, and it's not clear to me how to do much 
> better than 
> we currently do.
> 
> We do already have user-defined keywords (via \concept), but 
> these are 
> not widely used.
> 
> > 
> >  * Some functions aren't documented (eg. simulate.lm, formula.glm) -
> > typically, these are methods where the documentation is in the
> > generic.  Solution: these methods should all be aliased to 
> the generic
> > (by default?), and R CMD check should be amended to check for this
> > situation.  You could also argue that this is a deficiency with my
> > function, and easily fixed by automatically referring to the generic
> > if the specific isn't documented.
> 
> I'd say it's a deficiency of your function.  You might want 
> to look at 
> the code in get("?") and .helpForCall() to see how those 
> functions work 
> out things like
> 
> ?simulate(x)
> 
> where x is an lm object.  (But notice that .helpForCall is an 
> undocumented internal function; don't depend on its implementation 
> working forever).
> 
> >  * It can't supply suggestions when there isn't an explicit method
> > (ie. .default is used), this makes it pretty useless for basic
> > vectors.  This may not really be a problem, as all possible 
> operations
> > are probably too numerous to list.
> > 
> >  * Provides full name for function, when best practice is to use
> > generic part only when calling function.  However, getting precise
> > documentation may requires that full name. 
> 
> No, not if the call syntax above is used.
> 
>   I do the best I can
> > (returning the generic if specific is alias to a documentation file
> > with the same method name), but this reflects a deeper problem that
> > the name you should use when calling a function may be different to
> > the name you use to get documentation.
> > 
> >  * Can only display methods from currently loaded packages. 
>  This is a
> > shortcoming of the methods function, but I suspect it is 
> difficult to
> > find S3 methods without loading a package.
> > 
> > Relatively trivial problems:
> > 
> >  * Needs wide display to be effective.  Could be dealt with by
> > breaking description in a sensible manner (there may 
> already by R code
> > to do this.  Please let me know if you know of any)
> 
> I think strwrap() may do what you want.
> > 
> >  * Doesn't currently include S4 methods.  Solution: add 
> some more code
> > to wrap showMethods
> > 
> >  * Personally, I think sentence case is more aesthetically pleasing
> > (and more flexible) than title case.
> 
> It's quite hard to go from existing title case to sentence 
> case, because 
> we don't have any markup to indicate proper names.  One would 
> think it 
> would be easier to go in the opposite direction, but in fact the same 
> problem arises:  "van Beethoven" for example, not "Van Beethoven".
> 
> 
> > 
> > 
> > Hadley
> > 
> > 
> > hints <- function(x) {
> 
> I don't like the name "hints".  I think we already have too many ways 
> into the help system:
> 
> help
> ?
> help.search
> apropos
> etc.?
> 
> I like your function, but I'd rather see it attached to one of the 
> existing help functions, probably help.search().  For example,
> 
> help.search(x)
> 
> could look for functions designed to work with the class of 
> x, if it had 
> one.  (There's some ambiguity here:  perhaps x contains a 
> string, and I 
> want help on that string.)
> 
> Anyway, thanks for your efforts on this so far; I hope we end up with 
> something that can make it into the next release.
> 
> Duncan Murdoch
> 
> > 	db <- eval(utils:::.hsearch_db())
> > 	if (is.null(db)) {
> > 		help.search("abcd!", rebuild=TRUE, agrep=FALSE)
> > 		db <- eval(utils:::.hsearch_db())
> > 	}
> > 
> > 	base <- db$Base
> > 	alias <- db$Aliases
> > 	key <- db$Keywords
> > 
> > 	m <- all.methods(class=class(x))
> > 	m_id <- alias[match(m, alias[,1]), 2]
> > 	keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
> > 
> > 	f.names <- cbind(m, base[match(m_id, base[,3]), 4])
> > 	f.names <- unlist(lapply(1:nrow(f.names), function(i) {
> > 		if (is.na(f.names[i, 2])) return(f.names[i, 1])
> > 		a <- methodsplit(f.names[i, 1])
> > 		b <- methodsplit(f.names[i, 2])
> > 		
> > 		if (a[1] == b[1]) f.names[i, 2] else f.names[i, 
> 1]		
> > 	}))
> > 	
> > 	hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
> > 	hints <- hints[order(tolower(hints[,1])),]
> > 	hints <- rbind(    c("--------", "---------------"), hints)
> > 	rownames(hints) <- rep("", nrow(hints))
> > 	colnames(hints) <- c("Function", "Task")
> > 	hints[is.na(hints)] <- "(Unknown)"
> > 	
> > 	class(hints) <- "hints"
> > 	hints
> > }
> > 
> > print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
> > 
> > all.methods <- function(classes) {
> > 	methods <- do.call(rbind,lapply(classes, function(x) {
> > 		m <- methods(class=x)
> > 		t(sapply(as.vector(m), methodsplit)) #m[attr(m, 
> "info")$visible]
> > 	}))
> > 	rownames(methods[!duplicated(methods[,1]),])
> > }
> > 
> > methodsplit <- function(m) {
> > 	parts <- strsplit(m, "\\.")[[1]]
> > 	if (length(parts) == 1) {
> > 		c(name=m, class="")
> > 	} else{
> > 		c(name=paste(parts[-length(parts)], 
> collapse="."), class=parts[length(parts)])
> > 	}	
> > }
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help 
> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel 
> 
>

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel