[Rd] Documentation issues [Was: Function hints]

Tue Jun 20 12:58:14 CEST 2006

On 6/20/2006 5:18 AM, Heather Turner wrote:
> I would like to follow up on another one of the documentation issues raised in the discussion on function hints. Duncan mentioned that the R core were working on preprocessing directives for .Rd files, which could possibly include some sort of include directive. I was wondering if a "includeexamples" directive might also be considered.
> 
> It often makes sense to use the same example to illustrate the use of different functions, or perhaps extend an example used to illustrate one function to illustrate another. One way to do this is simply to put
> 
> example(fnA)
> 
> in the \examples for fnB, but this is not particularly helpful for people reading the help pages as they either need to look at both help pages or run the example. The alternative is to maintain multiple copies of the same code, which is not ideal.
> 
> So it would be useful to be able to put
> 
> \includeexamples(fnA)
> 
> so that the code is replicated in fnB.Rd. Perhaps an include directive could do this anyway, but it might be useful to have a special directive for examples so that RCMD check is set up to only check the original example to save time (and unnecessary effort).

Thanks, that's a good suggestion.  My inclination would be towards just 
one type of \include; it could be surrounded by notation saying not to 
check it in all but one instance if the author wanted to save testing time.

> On a related issue, it would be nice if source() had an option to print comments contained in the source file, so that example() and demo() could print out annotation.

Yes, this has been a long-standing need, but it's somewhat tricky 
because of the way source currently works:  it parses the whole file, 
then executes the parsed version.  The first step loses the comments, so 
you see a deparsed version when executing.  What I think it should do is 
have pointers back from the parsed version to the original source code, 
but that needs fairly low level changes.  This is some of the missing 
"infrastructure" I mentioned below.

Duncan Murdoch

> 
> Heather
> 
> Dr H Turner
> Research Assistant
> Dept. of Statistics
> The University of Warwick
> Coventry
> CV4 7AL
> 
> Tel: 024 76575870
> Fax: 024 7652 4532
> Url: www.warwick.ac.uk/go/heatherturner
> 
>>>> <Mark.Bravington at csiro.au> 06/20/06 01:43am >>>
> [This is not about the feasibility of a "hints" function-- which would
> be incredibly useful, but perhaps very very hard to do-- but about some
> of the other documentation issues raised in Hadley's post and in
> Duncan's reply]
> 
> WRTO documentation & code together: for several years, I've successfully
> used the 'mvbutils' package to keep every function definition & its
> documentation together, editing them together in the same file--
> function first, then documentation in plain-text (basically the format
> you see if you use "vanilla help" inside R). Storage-wise, the
> documentation is just kept as an attribute of the function (with a print
> method that hides it by default)-- I also keep a text backup of the
> combination. Any text editor will do. When it's time to create a
> package, the Rd file is generated automatically.
> 
> For me, it's been extremely helpful to keep function & documentation
> together during editing-- it greatly increases the chance that I will
> actually update the doco when I change the code, rather than putting it
> off until I've forgotten what I did. Also, writing Rd format is a
> nightmare (again, personal opinion)-- being able to write plain-text
> makes the whole documentation thing bearable.
> 
> The above is not quite to the point of the original post, I think, which
> talks about storing the documentation as commented bits *inside* the
> function code. However, I'm not sure the latter is really desirable;
> there is some merit in forcing authors to write an explicit "Details" or
> "Description" section that is not just a paraphrase of programming
> comments, and such sections are unlikely to fit easily inside code. At
> any rate, I wouldn't want to have to interpret my *own* programming
> comments as a usage guide!
> 
> WRTO automatic "usage" sections: it is easy to write code to do this
> ('prompt', and there is also some in 'mvbutils'-- not sure if it's in
> the current release though) but at least as far as the "usage" section
> goes, I think people should be "vigorously encouraged" to write their
> own, showing as far as possible how one might actually *use* the
> function. For many functions, just duplicating the argument list is not
> helpful to the user-- a function can often be invoked in several
> different ways, with different arguments relevant to different
> invocations. I think it's good to show how this can be done in the
> "usage" section, with comments, rather than deferring all practical
> usage to "examples". For one thing, "usage" is near the top, and so
> gives a very quick reminder without having to scroll through the entire
> doco; for another, "usage" and "arguments" are visually adjacent,
> whereas "examples" can be widely separated from "arguments".
> 
> My general point here is: the documentating process should be as
> painless as possible, but not more so. Defaults that are likely to lead
> to unhelpful documentation are perhaps best avoided.
> For this general reason, I applaud R's fairly rigid documentation
> standards, even though I frequently curse them. (And I would like to see
> some bits more rigid, such as compulsory "how-to-use-this" documentation
> for each package!)
> 
> The next version of 'mvbutils' will include various tools for easy "live
> editing" and automated preparation of packages-- I've been using them
> for a while, but still have to get round to finishing the documentation
> ;) 
> 
> Mark Bravington
> CSIRO Mathematical & Information Sciences
> Marine Laboratory
> Castray Esplanade
> Hobart 7001
> TAS
> 
> ph (+61) 3 6232 5118
> fax (+61) 3 6232 5012
> mob (+61) 438 315 623
>  
> 
>> -----Original Message-----
>> From: r-devel-bounces at r-project.org 
>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
>> Sent: Tuesday, 20 June 2006 12:39 AM
>> To: hadley wickham; R-devel
>> Subject: Re: [Rd] [R] Function hints
>>
>> I've moved this from R-help to R-devel, where I think it is 
>> more appropriate, and interspersed comments below.
>>
>>
>>
>> On 6/19/2006 8:51 AM, hadley wickham wrote:
>>> One of the recurring themes in the recent UserR conference was that
>>> many people find it difficult to find the functions they need for a
>>> particular task.  Sandy Weisberg suggested a small idea he 
>> would like
>>> to see: a hints function that given an object, lists likely
>>> operations.  I've done my best to implement this function using the
>>> tools currently available in R, and my code is included at 
>> the bottom
>>> of this email (I hope that I haven't just duplicated 
>> something already
>>> present in R).  I think Sandy's idea is genuinely useful, 
>> even in the
>>> limited form provided by my implementation, and I have already
>>> discovered a few useful functions that I was unaware of.
>>>
>>> While developing and testing this function, I ran into a 
>> few problems
>>> which, I think, represent underlying problems with the current
>>> documentation system.  These are typified by the results of running
>>> hints on a object produced by glm (having class c("glm", "lm")).  I
>>> have outlined (very tersely) some possible solutions.  Please note
>>> that while these solutions are largely technological, the problem is
>>> at heart sociological: writing documentation is no easier 
>> (and perhaps
>>> much harder) than writing a scientific publication, but the rewards
>>> are fewer.
>>>
>>> Problems:
>>>
>>>  * Many functions share the same description (eg. head, tail).
>>> Solution: each rdoc file should only describe one method. Problem:
>>> Writing rdoc files is tedious, there is a lot of information
>>> duplicated between the code and the documenation (eg. the usage
>>> statement) and some functions share a lot of similar information.
>>> Solution: make it easier to write documentation (eg. documentation
>>> inline with code), and easier to include certain common descriptions
>>> in multiple methods (eg. new include command)
>> I think it's bad to document dissimilar functions in the same 
>> file, but 
>> similar related functions *should* be documented together.  Not doing 
>> this just adds to the burden of documenting them, and the risk of 
>> modifying only part of the documentation so that it is inconsistent. 
>> The user also gets the benefit of seeing a common description all at 
>> once, rather than having to decide whether to follow "See also" links.
>>
>> Your solutions would both be interesting on their own merits 
>> regardless 
>> of the above.  We did decide to work on preprocessing 
>> directives for .Rd 
>> files at the R core meetings; some sort of include directive may be 
>> possible.
>>
>> I don't think I would want complete documentation mixed with the 
>> original source, but it would certainly be interesting to 
>> have partial 
>> documentation there.  (Complete documentation is too long, and would 
>> make it harder to read the source without a dedicated editor 
>> that could 
>> hide it.  Though ESS users may see it as a reasonable requirement to 
>> have everyone use the same editor, I don't think it is.)  
>> However, this 
>> is a lot of work, depending on infrastructure that is not in place.
>>
>>>  * It is difficult to tell which functions are commonly
>>> used/important. Solution: break down by keywords. Problem: keywords
>>> are not useful at the moment.  Solution:  make better list 
>> of keywords
>>> available and encourage people to use it.  Problem: people won't
>>> unless there is a strong incentive, plus good keywording requires
>>> considerable expertise (especially in bulding up list).  This is
>>> probably insoluable unless one person systematically keywords all of
>>> the base packages.
>> I think it is worse than that.  There are concepts in 
>> packages that just 
>> don't arise in base R, and hence there would be no keywords for them 
>> other than "misc", even if someone redesigned the current system. 
>> Keywording is hard, and it's not clear to me how to do much 
>> better than 
>> we currently do.
>>
>> We do already have user-defined keywords (via \concept), but 
>> these are 
>> not widely used.
>>
>>>  * Some functions aren't documented (eg. simulate.lm, formula.glm) -
>>> typically, these are methods where the documentation is in the
>>> generic.  Solution: these methods should all be aliased to 
>> the generic
>>> (by default?), and R CMD check should be amended to check for this
>>> situation.  You could also argue that this is a deficiency with my
>>> function, and easily fixed by automatically referring to the generic
>>> if the specific isn't documented.
>> I'd say it's a deficiency of your function.  You might want 
>> to look at 
>> the code in get("?") and .helpForCall() to see how those 
>> functions work 
>> out things like
>>
>> ?simulate(x)
>>
>> where x is an lm object.  (But notice that .helpForCall is an 
>> undocumented internal function; don't depend on its implementation 
>> working forever).
>>
>>>  * It can't supply suggestions when there isn't an explicit method
>>> (ie. .default is used), this makes it pretty useless for basic
>>> vectors.  This may not really be a problem, as all possible 
>> operations
>>> are probably too numerous to list.
>>>
>>>  * Provides full name for function, when best practice is to use
>>> generic part only when calling function.  However, getting precise
>>> documentation may requires that full name. 
>> No, not if the call syntax above is used.
>>
>>   I do the best I can
>>> (returning the generic if specific is alias to a documentation file
>>> with the same method name), but this reflects a deeper problem that
>>> the name you should use when calling a function may be different to
>>> the name you use to get documentation.
>>>
>>>  * Can only display methods from currently loaded packages. 
>>  This is a
>>> shortcoming of the methods function, but I suspect it is 
>> difficult to
>>> find S3 methods without loading a package.
>>>
>>> Relatively trivial problems:
>>>
>>>  * Needs wide display to be effective.  Could be dealt with by
>>> breaking description in a sensible manner (there may 
>> already by R code
>>> to do this.  Please let me know if you know of any)
>> I think strwrap() may do what you want.
>>>  * Doesn't currently include S4 methods.  Solution: add 
>> some more code
>>> to wrap showMethods
>>>
>>>  * Personally, I think sentence case is more aesthetically pleasing
>>> (and more flexible) than title case.
>> It's quite hard to go from existing title case to sentence 
>> case, because 
>> we don't have any markup to indicate proper names.  One would 
>> think it 
>> would be easier to go in the opposite direction, but in fact the same 
>> problem arises:  "van Beethoven" for example, not "Van Beethoven".
>>
>>
>>>
>>> Hadley
>>>
>>>
>>> hints <- function(x) {
>> I don't like the name "hints".  I think we already have too many ways 
>> into the help system:
>>
>> help
>> ?
>> help.search
>> apropos
>> etc.?
>>
>> I like your function, but I'd rather see it attached to one of the 
>> existing help functions, probably help.search().  For example,
>>
>> help.search(x)
>>
>> could look for functions designed to work with the class of 
>> x, if it had 
>> one.  (There's some ambiguity here:  perhaps x contains a 
>> string, and I 
>> want help on that string.)
>>
>> Anyway, thanks for your efforts on this so far; I hope we end up with 
>> something that can make it into the next release.
>>
>> Duncan Murdoch
>>
>>> 	db <- eval(utils:::.hsearch_db())
>>> 	if (is.null(db)) {
>>> 		help.search("abcd!", rebuild=TRUE, agrep=FALSE)
>>> 		db <- eval(utils:::.hsearch_db())
>>> 	}
>>>
>>> 	base <- db$Base
>>> 	alias <- db$Aliases
>>> 	key <- db$Keywords
>>>
>>> 	m <- all.methods(class=class(x))
>>> 	m_id <- alias[match(m, alias[,1]), 2]
>>> 	keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
>>>
>>> 	f.names <- cbind(m, base[match(m_id, base[,3]), 4])
>>> 	f.names <- unlist(lapply(1:nrow(f.names), function(i) {
>>> 		if (is.na(f.names[i, 2])) return(f.names[i, 1])
>>> 		a <- methodsplit(f.names[i, 1])
>>> 		b <- methodsplit(f.names[i, 2])
>>> 		
>>> 		if (a[1] == b[1]) f.names[i, 2] else f.names[i, 
>> 1]		
>>> 	}))
>>> 	
>>> 	hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
>>> 	hints <- hints[order(tolower(hints[,1])),]
>>> 	hints <- rbind(    c("--------", "---------------"), hints)
>>> 	rownames(hints) <- rep("", nrow(hints))
>>> 	colnames(hints) <- c("Function", "Task")
>>> 	hints[is.na(hints)] <- "(Unknown)"
>>> 	
>>> 	class(hints) <- "hints"
>>> 	hints
>>> }
>>>
>>> print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
>>>
>>> all.methods <- function(classes) {
>>> 	methods <- do.call(rbind,lapply(classes, function(x) {
>>> 		m <- methods(class=x)
>>> 		t(sapply(as.vector(m), methodsplit)) #m[attr(m, 
>> "info")$visible]
>>> 	}))
>>> 	rownames(methods[!duplicated(methods[,1]),])
>>> }
>>>
>>> methodsplit <- function(m) {
>>> 	parts <- strsplit(m, "\\.")[[1]]
>>> 	if (length(parts) == 1) {
>>> 		c(name=m, class="")
>>> 	} else{
>>> 		c(name=paste(parts[-length(parts)], 
>> collapse="."), class=parts[length(parts)])
>>> 	}	
>>> }
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help 
>>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html 
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel 
>>
>>
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel