[Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

Martin Maechler maechler at stat.math.ethz.ch
Thu Jan 8 10:16:06 CET 2015


In November, we had a "bug repository conversation"
with Peter Hagerty and myself:

  https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065

where the bug report title started with

 --->>  "exists" is a bottleneck for dispatch and package loading, ...

Peter proposed an extra simplified and henc faster version of exists(),
and I commented

    > --- Comment #2 from Martin Maechler <maechler at stat.math.ethz.ch> ---
    > I'm very grateful that you've started exploring the bottlenecks of loading
    > packages with many S4 classes (and methods)...
    > and I hope we can make real progress there rather sooner than later.

    > OTOH, your `summaryRprof()` in your vignette indicates that exists() may use
    > upto 10% of the time spent in library(reportingTools),  and your speedup
    > proposals of exist()  may go up to ca 30%  which is good and well worth
    > considering,  but still we can only expect 2-3% speedup for package loading
    > which unfortunately is not much.

    > Still I agree it is worth looking at exists() as you did  ... and 
    > consider providing a fast simplified version of it in addition to current
    > exists() [I think].

    > BTW, as we talk about enhancements here, maybe consider a further possibility:
    > My subjective guess is that probably more than half of exists() uses are of the
    > form

    > if(exists(name, where, .......)) {
    >    get(name, whare, ....)
    >    ..
    > } else { 
    >     NULL / error() / .. or similar
    > }

    > i.e. many exists() calls when returning TRUE are immediately followed by the
    > corresponding get() call which repeats quite a bit of the lookup that exists()
    > has done.

    > Instead, I'd imagine a function, say  getifexists(name, ...) that does both at
    > once in the "exists is TRUE" case but in a way we can easily keep the if(.) ..
    > else clause above.  One already existing approach would use

    > if(!inherits(tryCatch(xx <- get(name, where, ...), error=function(e)e), "error")) {

    >   ... (( work with xx )) ...

    > } else  { 
    >    NULL / error() / .. or similar
    > }

    > but of course our C implementation would be more efficient and use more concise
    > syntax {which should not look like error handling}.   Follow ups to this idea
    > should really go to R-devel (the mailing list).

and now I do follow up here myself :

I found that  'getifexists()' is actually very simple to implement,
I have already tested it a bit, but not yet committed to R-devel
(the "R trunk" aka "master branch") because I'd like to get
public comments {RFC := Request For Comments}.

My version of the help file {for both exists() and getifexists()}
rendered in text is

---------------------- help(getifexists) -------------------------------
Is an Object Defined?

Description:

     Look for an R object of the given name and possibly return it

Usage:

     exists(x, where = -1, envir = , frame, mode = "any",
            inherits = TRUE)
     
     getifexists(x, where = -1, envir = as.environment(where),
                 mode = "any", inherits = TRUE, value.if.not = NULL)
     
Arguments:

       x: a variable name (given as a character string).

   where: where to look for the object (see the details section); if
          omitted, the function will search as if the name of the
          object appeared unquoted in an expression.

   envir: an alternative way to specify an environment to look in, but
          it is usually simpler to just use the ‘where’ argument.

   frame: a frame in the calling list.  Equivalent to giving ‘where’ as
          ‘sys.frame(frame)’.

    mode: the mode or type of object sought: see the ‘Details’ section.

inherits: should the enclosing frames of the environment be searched?

value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not
          exist.

Details:

     The ‘where’ argument can specify the environment in which to look
     for the object in any of several ways: as an integer (the position
     in the ‘search’ list); as the character string name of an element
     in the search list; or as an ‘environment’ (including using
     ‘sys.frame’ to access the currently active function calls).  The
     ‘envir’ argument is an alternative way to specify an environment,
     but is primarily there for back compatibility.

     This function looks to see if the name ‘x’ has a value bound to it
     in the specified environment.  If ‘inherits’ is ‘TRUE’ and a value
     is not found for ‘x’ in the specified environment, the enclosing
     frames of the environment are searched until the name ‘x’ is
     encountered.  See ‘environment’ and the ‘R Language Definition’
     manual for details about the structure of environments and their
     enclosures.

     *Warning:* ‘inherits = TRUE’ is the default behaviour for R but
     not for S.

     If ‘mode’ is specified then only objects of that type are sought.
     The ‘mode’ may specify one of the collections ‘"numeric"’ and
     ‘"function"’ (see ‘mode’): any member of the collection will
     suffice.  (This is true even if a member of a collection is
     specified, so for example ‘mode = "special"’ will seek any type of
     function.)

Value:

     ‘exists():’ Logical, true if and only if an object of the correct
     name and mode is found.

     ‘getifexists():’ The object-as from ‘get(x, *)’- if ‘exists(x, *)’
     is true, otherwise ‘value.if.not’.

Note:

   With ‘getifexists()’, instead of the easy to read but somewhat
   inefficient
     
       if (exists(myVarName, envir = myEnvir)) {
         r <- get(myVarName, envir = myEnvir)
         ## ... deal with r ...
       }

   you now can use the more efficient (and slightly harder to read)
     
       if (!is.null(r <- getifexists(myVarName, envir = myEnvir))) {
         ## ... deal with r ...
       }

References:

     Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
     Language_.  Wadsworth & Brooks/Cole.

See Also:

     ‘get’.  For quite a different kind of “existence” checking, namely
     if function arguments were specified, ‘missing’; and for yet a
     different kind, namely if a file exists, ‘file.exists’.

Examples:

     ##  Define a substitute function if necessary:
     if(!exists("some.fun", mode = "function"))
       some.fun <- function(x) { cat("some.fun(x)\n"); x }
     search()
     exists("ls", 2) # true even though ls is in pos = 3
     exists("ls", 2, inherits = FALSE) # false
     
     ## These are true (in most circumstances):
     identical(ls,   getifexists("ls"))
     identical(NULL, getifexists(".foo.bar.")) # default value.if.not = NULL(!)

----------------- end[ help(getifexists) ] -----------------------------



More information about the R-devel mailing list