[R] Function hints

Joerg van den Hoff j.van_den_hoff at fz-rossendorf.de
Tue Jun 20 16:32:18 CEST 2006


hadley wickham wrote:
>> what I really would love to see would be an improved help.search():
>> on r-devel I found a reference to the /concept tag in .Rd files and the
>> fact that it is rarely used (again: I was not aware of this :-( ...),
>> which might serve as keyword container suitable for improving
>> help.search() results. what about changing the syntax here to 
>> something like
>> \concept {
>>     keyword = score,
>>     keyword = score
>>     ...
>> }
>> where score would be restricted to a small range of values (say, 1-3 or
>> 1-5). if package maintainer then would choose a handful of sensible
>> keywords (and scores) for a package and its functions one could expect
>> improved search results. this might be a naive idea, but could a
>> sort-by-relevance in the help.search() output profit from this?
> 
> This is not something I think you can solve automatically.  Good
> keywording requries a lot of effort, and needs to be consistent to be
> useful.  The only way to achieve consistency is to have only person
I was thinking of manual keywording (by the package authors, nobody 
else!) as a means to give the search engine (help.search()) reasonable 
information including a (subjective) relevance score for each keyword.
of course, the problem is the same as with every (especially permuted) 
index: to find the best compromise betweeen indexing next to nothing and 
indexing everything (the best probably meaning to index comprehensively 
but not excessively with reasonable index terms) in the documents at hand.
sure, consistency could not be enforced but it's not consistent right 
now, simply because the real \keyword tag is far to restrictive for 
indexing purposes(only a handful of predefined allowed keywords) and 
otherwise only the name/alias and title in the Rd files seem to be 
searched (and here the author must be really aware that these fields are 
at the moment the ones which should be forced to contain the relevant 
'keywords' if the function is to be found by help.search -- this imposes 
sometimes rather artificial constraints on the wording, especially if 
you try to include some general keyword in the title of a very 
specialized function).

looking at the example I gave
(help.search("fitting") etc.) it's quite clear that `nls' simply is not 
found because 'fitting' does not occur in the title, but I trust, if 
asked to provide, say, three keywords, one of them would contain "fit" 
or "fitting". I mean, every scientific journal asks you to do just this: 
provide some free-text keywords, which you think to be relevant for the 
paper. there are no restrictions/directives, usually, but the purpose 
(to categorize the paper a bit) is served quite well.

and maybe the \concept tag really is meant for something different, I'm 
not sure. what I have in mind really is similar to providing index terms 
(plus scores to guide `help.search' in sorting). to stay with the `nls' 
example:
\concept {
    non-linear fitting = 4
    non-linear least-squares = 5
    non-linear models = 3
    parameter estimimation = 2
    gauss-newton = 1
}
would probably achieve that `nls' usually is correctly found (if this 
syntax were allowed). apart from the scores (which would be nice, I 
think) my main point is that extensive use of \concept (or a new tag 
`\index', for instance, if \concept's purpose is actually different -- 
I'm not sure) should be pushed to get better hits from help.search().

I personally have decided to start using the \concept tag in its present 
form for our local .Rd files extensively to "inject" a sufficient number 
of free-text relevant keywords into help.search()


joerg
> keywording (difficult/expensive), or exhaustively document the process
> of keywording and then require all package authors to read and use
> (impossible).
> 
> Hadley



More information about the R-help mailing list