[Rd] RFC: What should ?foo do?

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Apr 30 13:23:02 CEST 2008


On Wed, 30 Apr 2008, Duncan Murdoch wrote:

> On 30/04/2008 2:44 AM, Martin Maechler wrote:
>>>>>>> "DM" == Duncan Murdoch <murdoch at stats.uwo.ca>
>>>>>>>     on Sat, 26 Apr 2008 17:21:06 -0400 writes:
>>
>>     DM> On 25/04/2008 2:47 PM, Prof Brian Ripley wrote:
>>     >> On Fri, 25 Apr 2008, Deepayan Sarkar wrote:
>>     >>     >>> For what it's worth, I use ?foo mostly to look up usage of 
>> functions
>>     >>> that I know I want to use, and find it perfect for that (one 
>> benefit
>>     >>> over help() is that completion works for ?). The only thing I miss 
>> is
>>     >>> the ability to do the equivalent of help("foo", package = "bar");
>>     >>> ?bar::foo gives the help page for "::". Perhaps that would be
>>     >>> something to consider for addition.
>>     >>     >> That fits most naturally with the (somewhat technical) idea 
>> that bar::foo     >> becomes a symbol and not a function call.  I believe 
>> that several of think     >> that is in principle a better idea, but no one 
>> has as yet (AFAIK) explored     >> the ramifications.
>>     >>     >> However, 5 mins looking at the sources suggests that it is 
>> easy to do.
>> 
>>
>>     DM> And you already did.  Thanks!
>> indeed.
>>
>>     DM> I'm going to make the following change soon (in R-devel).
>>
>>     DM> ??foo
>>
>>     DM> will now be like help.search("foo").  This will work with your 
>> change,     DM> so ??utils::foo will limit the search to the utils package. 
>> This is     DM> also quite easy.  A more difficult thing I'd like to do is 
>> to broaden     DM> the search to look outside the man pages, but that's a 
>> lot harder, and I     DM> haven't started on it.
>>
>>     DM> I will also follow Hadley's suggestion and change the format of the 
>> DM> help.search results, so you can just cut and paste after a question 
>> mark     DM> to look up the particular topic, e.g.  ??foo gives
>>
>>     DM> utils::citEntry         Writing Package CITATION Files
>>
>>     DM> Type '?PKG::FOO' to inspect entry 'PKG::FOO TITLE'.
>>
>>     DM> I haven't touched the case of ?foo failing; I'll want to try it for 
>> a     DM> while to decide whether I like it best as is:
>>
>>     >> ?foo
>>     DM> No documentation for 'foo' in specified packages and libraries:
>>     DM> you could try '??foo'
>>
>>     DM> or whether it should just automatically call help.search, or 
>> something     DM> in between.
>> 
>> Please the former, at least by default!
>> [The case of 1500 installed packages was mentioned before...]
>> 
>> Note one thing that hasn't been mentioned before:
>> 
>> help() has had the optional argument
>>        ' try.all.packages = getOption("help.try.all.packages") '
>> for many years now, and I have been involved in its history as
>> well but don't recall all details. IIRC,
>> help() {and hence "?"} used to *default* to  'try.all.packages = TRUE' for 
>> a while and later it was the
>> default for me (and our whole statistics departmental unit).
>> But we found that it *was* inconvenient that a big search was
>> started, often just because of a typo.
>> So I think   ?<non-existing>  should ``answer quickly'' by
>> default.
>
> Have you tried help.search() lately?  It is now very fast.  I haven't checked 
> if help() makes use of the same search mechanism, but presumably it could do 
> so, if speed is an issue.
>
> So I would say the speed is a solvable or solved problem.

There are some possible improvements as yet.  Hadley mentioned keeping 
binary indices -- we do per-package and could per-library.  Just opening 
1700 files can be quite slow on some systems -- this is one of the areas 
where you see the benefits of Unix-alike file systems.

A lot of the speed ups are generic, e.g. internal file.path.  I get

> system.time(help("linear", try.all.packages = TRUE))
    user  system elapsed
  10.948   2.620  37.808
> system.time(help.search("linear"))
    user  system elapsed
   8.219   0.432  28.358

so there is room for improvement in help().  However, the re-run

> system.time(help.search("linear"))
    user  system elapsed
   1.951   0.003   1.960

shows the benefits of caching.

(This is on a not particularly fast machine with all of CRAN and BioC 
installed, in UTF-8: and I know of some ways to improve performance in 
UTF-8.)

It's all a question of resources and who is prepared to contribute.
I sped help.search() up ca 3x because 100s was too slow for me -- 30s the 
first time in a session is OK.  (And incidentally disc caching means that 
the next session got

> system.time(help.search("linear"))
    user  system elapsed
   7.180   0.246   7.627

, so the main issue is disc access.)

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list