[Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

Michael Lawrence lawrence.michael at gene.com
Thu Jan 8 15:58:00 CET 2015


If we do add an argument to get(), then it should be named consistently
with the ifnotfound argument of mget(). As mentioned, the possibility of a
NULL value is problematic. One solution is a sentinel value that indicates
an unbound value (like R_UnboundValue).

But another idea (and one pretty similar to John's) is to follow the SYMSXP
design at the C level, where there is a structure that points to the name
and a value. We already have SYMSXPs at the R level of course (name
objects) but they do not provide access to the value, which is typically
R_UnboundValue. But this does not even need to be implemented with SYMSXP.
The design would allow something like:

binding <- getBinding("x", env)
if (hasValue(binding)) {
  x <- value(binding) # throws an error if none
  message(name(binding), "has value", x)
}

That I think it is a bit verbose but readable and could be made fast. And I
think binding objects would be useful in other ways, as they are
essentially a "named object". For example, when iterating over an
environment.

Michael




On Thu, Jan 8, 2015 at 6:03 AM, John Nolan <jpnolan at american.edu> wrote:

> Adding an optional argument to get (and mget) like
>
> val <- get(name, where, ..., value.if.not.found=NULL )   (*)
>
> would be useful for many.  HOWEVER, it is possible that there could be
> some confusion here: (*) can give a NULL because either x exists and
> has value NULL, or because x doesn't exist.   If that matters, the user
> would need to be careful about specifying a value.if.not.found that cannot
> be confused with a valid value of x.
>
> To avoid this difficulty, perhaps we want both: have Martin's getifexists(
> )
> return a list with two values:
>   - a boolean variable 'found'  # = value returned by exists( )
>   - a variable 'value'
>
> Then implement get( ) as:
>
> get <- function(x,...,value.if.not.found ) {
>
>   if( missing(value.if.not.found) ) {
>     a <- getifexists(x,... )
>     if (!a$found) error("x not found")
>   } else {
>     a <- getifexists(x,...,value.if.not.found )
>   }
>   return(a$value)
> }
>
> Note that value.if.not.found has no default value in above.
> It behaves exactly like current get does if value.if.not.found
> is not specified, and if it is specified, it would be faster
> in the common situation mentioned below:
>      if(exists(x,...)) { get(x,...) }
>
> John
>
> P.S. if you like dromedaries call it valueIfNotFound ...
>
>  ..............................................................
>  John P. Nolan
>  Math/Stat Department
>  227 Gray Hall,   American University
>  4400 Massachusetts Avenue, NW
>  Washington, DC 20016-8050
>
>  jpnolan at american.edu       voice: 202.885.3140
>  web: academic2.american.edu/~jpnolan
>  ..............................................................
>
>
> -----"R-devel" <r-devel-bounces at r-project.org> wrote: -----
> To: Martin Maechler <maechler at stat.math.ethz.ch>, R-devel at r-project.org
> From: Duncan Murdoch
> Sent by: "R-devel"
> Date: 01/08/2015 06:39AM
> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
>
> On 08/01/2015 4:16 AM, Martin Maechler wrote:
> > In November, we had a "bug repository conversation"
> > with Peter Hagerty and myself:
> >
> >   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
> >
> > where the bug report title started with
> >
> >  --->>  "exists" is a bottleneck for dispatch and package loading, ...
> >
> > Peter proposed an extra simplified and henc faster version of exists(),
> > and I commented
> >
> >     > --- Comment #2 from Martin Maechler <maechler at stat.math.ethz.ch>
> ---
> >     > I'm very grateful that you've started exploring the bottlenecks of
> loading
> >     > packages with many S4 classes (and methods)...
> >     > and I hope we can make real progress there rather sooner than
> later.
> >
> >     > OTOH, your `summaryRprof()` in your vignette indicates that
> exists() may use
> >     > upto 10% of the time spent in library(reportingTools),  and your
> speedup
> >     > proposals of exist()  may go up to ca 30%  which is good and well
> worth
> >     > considering,  but still we can only expect 2-3% speedup for
> package loading
> >     > which unfortunately is not much.
> >
> >     > Still I agree it is worth looking at exists() as you did  ... and
> >     > consider providing a fast simplified version of it in addition to
> current
> >     > exists() [I think].
> >
> >     > BTW, as we talk about enhancements here, maybe consider a further
> possibility:
> >     > My subjective guess is that probably more than half of exists()
> uses are of the
> >     > form
> >
> >     > if(exists(name, where, .......)) {
> >     >    get(name, whare, ....)
> >     >    ..
> >     > } else {
> >     >     NULL / error() / .. or similar
> >     > }
> >
> >     > i.e. many exists() calls when returning TRUE are immediately
> followed by the
> >     > corresponding get() call which repeats quite a bit of the lookup
> that exists()
> >     > has done.
> >
> >     > Instead, I'd imagine a function, say  getifexists(name, ...) that
> does both at
> >     > once in the "exists is TRUE" case but in a way we can easily keep
> the if(.) ..
> >     > else clause above.  One already existing approach would use
> >
> >     > if(!inherits(tryCatch(xx <- get(name, where, ...),
> error=function(e)e), "error")) {
> >
> >     >   ... (( work with xx )) ...
> >
> >     > } else  {
> >     >    NULL / error() / .. or similar
> >     > }
> >
> >     > but of course our C implementation would be more efficient and use
> more concise
> >     > syntax {which should not look like error handling}.   Follow ups
> to this idea
> >     > should really go to R-devel (the mailing list).
> >
> > and now I do follow up here myself :
> >
> > I found that  'getifexists()' is actually very simple to implement,
> > I have already tested it a bit, but not yet committed to R-devel
> > (the "R trunk" aka "master branch") because I'd like to get
> > public comments {RFC := Request For Comments}.
> >
>
> I don't like the name -- I'd prefer getIfExists.  As Baath (2012, R
> Journal) pointed out, R names are very inconsistent in naming
> conventions, but lowerCamelCase is the most common choice.  Second most
> common is period.separated, so an argument could be made for
> get.if.exists, but there's still the possibility of confusion with S3
> methods, and users of other languages where "." is an operator find it a
> little strange.
>
> If you don't like lowerCamelCase (and a lot of people don't), then I
> think underscore_separated is the next best choice, so would use
> get_if_exists.
>
> Another possibility is to make no new name at all, and just add an
> optional parameter to get() (which if present acts as your value.if.not
> parameter, if not present keeps the current "object not found" error).
>
> Duncan Murdoch
>
>
> > My version of the help file {for both exists() and getifexists()}
> > rendered in text is
> >
> > ---------------------- help(getifexists) -------------------------------
> > Is an Object Defined?
> >
> > Description:
> >
> >      Look for an R object of the given name and possibly return it
> >
> > Usage:
> >
> >      exists(x, where = -1, envir = , frame, mode = "any",
> >             inherits = TRUE)
> >
> >      getifexists(x, where = -1, envir = as.environment(where),
> >                  mode = "any", inherits = TRUE, value.if.not = NULL)
> >
> > Arguments:
> >
> >        x: a variable name (given as a character string).
> >
> >    where: where to look for the object (see the details section); if
> >           omitted, the function will search as if the name of the
> >           object appeared unquoted in an expression.
> >
> >    envir: an alternative way to specify an environment to look in, but
> >           it is usually simpler to just use the ‘where’ argument.
> >
> >    frame: a frame in the calling list.  Equivalent to giving ‘where’ as
> >           ‘sys.frame(frame)’.
> >
> >     mode: the mode or type of object sought: see the ‘Details’ section.
> >
> > inherits: should the enclosing frames of the environment be searched?
> >
> > value.if.not: the return value of ‘getifexists(x, *)’ when ‘x’ does not
> >           exist.
> >
> > Details:
> >
> >      The ‘where’ argument can specify the environment in which to look
> >      for the object in any of several ways: as an integer (the position
> >      in the ‘search’ list); as the character string name of an element
> >      in the search list; or as an ‘environment’ (including using
> >      ‘sys.frame’ to access the currently active function calls).  The
> >      ‘envir’ argument is an alternative way to specify an environment,
> >      but is primarily there for back compatibility.
> >
> >      This function looks to see if the name ‘x’ has a value bound to it
> >      in the specified environment.  If ‘inherits’ is ‘TRUE’ and a value
> >      is not found for ‘x’ in the specified environment, the enclosing
> >      frames of the environment are searched until the name ‘x’ is
> >      encountered.  See ‘environment’ and the ‘R Language Definition’
> >      manual for details about the structure of environments and their
> >      enclosures.
> >
> >      *Warning:* ‘inherits = TRUE’ is the default behaviour for R but
> >      not for S.
> >
> >      If ‘mode’ is specified then only objects of that type are sought.
> >      The ‘mode’ may specify one of the collections ‘"numeric"’ and
> >      ‘"function"’ (see ‘mode’): any member of the collection will
> >      suffice.  (This is true even if a member of a collection is
> >      specified, so for example ‘mode = "special"’ will seek any type of
> >      function.)
> >
> > Value:
> >
> >      ‘exists():’ Logical, true if and only if an object of the correct
> >      name and mode is found.
> >
> >      ‘getifexists():’ The object-as from ‘get(x, *)’- if ‘exists(x, *)’
> >      is true, otherwise ‘value.if.not’.
> >
> > Note:
> >
> >    With ‘getifexists()’, instead of the easy to read but somewhat
> >    inefficient
> >
> >        if (exists(myVarName, envir = myEnvir)) {
> >          r <- get(myVarName, envir = myEnvir)
> >          ## ... deal with r ...
> >        }
> >
> >    you now can use the more efficient (and slightly harder to read)
> >
> >        if (!is.null(r <- getifexists(myVarName, envir = myEnvir))) {
> >          ## ... deal with r ...
> >        }
> >
> > References:
> >
> >      Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
> >      Language_.  Wadsworth & Brooks/Cole.
> >
> > See Also:
> >
> >      ‘get’.  For quite a different kind of “existence” checking, namely
> >      if function arguments were specified, ‘missing’; and for yet a
> >      different kind, namely if a file exists, ‘file.exists’.
> >
> > Examples:
> >
> >      ##  Define a substitute function if necessary:
> >      if(!exists("some.fun", mode = "function"))
> >        some.fun <- function(x) { cat("some.fun(x)\n"); x }
> >      search()
> >      exists("ls", 2) # true even though ls is in pos = 3
> >      exists("ls", 2, inherits = FALSE) # false
> >
> >      ## These are true (in most circumstances):
> >      identical(ls,   getifexists("ls"))
> >      identical(NULL, getifexists(".foo.bar.")) # default value.if.not =
> NULL(!)
> >
> > ----------------- end[ help(getifexists) ] -----------------------------
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list