[Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}

Peter Haverty haverty.peter at gene.com
Thu Jan 8 22:44:56 CET 2015


Michael's idea has an interesting bonus that he and I discussed earlier.
It would be very convenient to have a container of key/value pairs.  I
imagine many people often write this:

x - mapply( names(x), x, FUN=function(k,v) { # work with key and value }

especially ex perl people accustomed to

while ( ($key, $value) = each( some_hash ) { }

Perhaps there is room for additional discussion of using lists of SYMSXPs
in this manner. (If SYMSXPs are not that safe, perhaps a looping construct
for named vectors that gave the illusion iterating over a list of
two-tuples.)



Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com

On Thu, Jan 8, 2015 at 11:57 AM, <luke-tierney at uiowa.edu> wrote:

> On Thu, 8 Jan 2015, Michael Lawrence wrote:
>
>  If we do add an argument to get(), then it should be named consistently
>> with the ifnotfound argument of mget(). As mentioned, the possibility of a
>> NULL value is problematic. One solution is a sentinel value that indicates
>> an unbound value (like R_UnboundValue).
>>
>
> A null default is fine -- it's a default; if it isn't right for a
> particular case you can provide something else.
>
>
>> But another idea (and one pretty similar to John's) is to follow the
>> SYMSXP
>> design at the C level, where there is a structure that points to the name
>> and a value. We already have SYMSXPs at the R level of course (name
>> objects) but they do not provide access to the value, which is typically
>> R_UnboundValue. But this does not even need to be implemented with SYMSXP.
>> The design would allow something like:
>>
>> binding <- getBinding("x", env)
>> if (hasValue(binding)) {
>>  x <- value(binding) # throws an error if none
>>  message(name(binding), "has value", x)
>> }
>>
>> That I think it is a bit verbose but readable and could be made fast. And
>> I
>> think binding objects would be useful in other ways, as they are
>> essentially a "named object". For example, when iterating over an
>> environment.
>>
>
> This would need a lot more thought. Directly exposing the internals is
> definitely not something we want to do as we may well want to change
> that design. But there are lots of other corner issues that would have
> to be thought through before going forward, such as what happens if an
> rm occurs between obtaining a binding object and doing something with
> it. Serialization would also need thinking through. This doesn't seem
> like a worthwhile place to spend our efforts to me.
>
> Adding getIfExists, or .get, or get0, or whatever seems fine. Adding
> an argument to get() with missing giving current behavior may be OK
> too. Rewriting exists and get as .Primitives may be sufficient though.
>
> Best,
>
> luke
>
>
>  Michael
>>
>>
>>
>>
>> On Thu, Jan 8, 2015 at 6:03 AM, John Nolan <jpnolan at american.edu> wrote:
>>
>>  Adding an optional argument to get (and mget) like
>>>
>>> val <- get(name, where, ..., value.if.not.found=NULL )   (*)
>>>
>>> would be useful for many.  HOWEVER, it is possible that there could be
>>> some confusion here: (*) can give a NULL because either x exists and
>>> has value NULL, or because x doesn't exist.   If that matters, the user
>>> would need to be careful about specifying a value.if.not.found that
>>> cannot
>>> be confused with a valid value of x.
>>>
>>> To avoid this difficulty, perhaps we want both: have Martin's
>>> getifexists(
>>> )
>>> return a list with two values:
>>>   - a boolean variable 'found'  # = value returned by exists( )
>>>   - a variable 'value'
>>>
>>> Then implement get( ) as:
>>>
>>> get <- function(x,...,value.if.not.found ) {
>>>
>>>   if( missing(value.if.not.found) ) {
>>>     a <- getifexists(x,... )
>>>     if (!a$found) error("x not found")
>>>   } else {
>>>     a <- getifexists(x,...,value.if.not.found )
>>>   }
>>>   return(a$value)
>>> }
>>>
>>> Note that value.if.not.found has no default value in above.
>>> It behaves exactly like current get does if value.if.not.found
>>> is not specified, and if it is specified, it would be faster
>>> in the common situation mentioned below:
>>>      if(exists(x,...)) { get(x,...) }
>>>
>>> John
>>>
>>> P.S. if you like dromedaries call it valueIfNotFound ...
>>>
>>>  ..............................................................
>>>  John P. Nolan
>>>  Math/Stat Department
>>>  227 Gray Hall,   American University
>>>  4400 Massachusetts Avenue, NW
>>>  Washington, DC 20016-8050
>>>
>>>  jpnolan at american.edu       voice: 202.885.3140
>>>  web: academic2.american.edu/~jpnolan
>>>  ..............................................................
>>>
>>>
>>> -----"R-devel" <r-devel-bounces at r-project.org> wrote: -----
>>> To: Martin Maechler <maechler at stat.math.ethz.ch>, R-devel at r-project.org
>>> From: Duncan Murdoch
>>> Sent by: "R-devel"
>>> Date: 01/08/2015 06:39AM
>>> Subject: Re: [Rd] RFC: getifexists() {was [Bug 16065] "exists" ...}
>>>
>>> On 08/01/2015 4:16 AM, Martin Maechler wrote:
>>> > In November, we had a "bug repository conversation"
>>> > with Peter Hagerty and myself:
>>> >
>>> >   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16065
>>> >
>>> > where the bug report title started with
>>> >
>>> >  --->>  "exists" is a bottleneck for dispatch and package loading, ...
>>> >
>>> > Peter proposed an extra simplified and henc faster version of exists(),
>>> > and I commented
>>> >
>>> >     > --- Comment #2 from Martin Maechler <maechler at stat.math.ethz.ch>
>>> ---
>>> >     > I'm very grateful that you've started exploring the bottlenecks
>>> of
>>> loading
>>> >     > packages with many S4 classes (and methods)...
>>> >     > and I hope we can make real progress there rather sooner than
>>> later.
>>> >
>>> >     > OTOH, your `summaryRprof()` in your vignette indicates that
>>> exists() may use
>>> >     > upto 10% of the time spent in library(reportingTools),  and your
>>> speedup
>>> >     > proposals of exist()  may go up to ca 30%  which is good and well
>>> worth
>>> >     > considering,  but still we can only expect 2-3% speedup for
>>> package loading
>>> >     > which unfortunately is not much.
>>> >
>>> >     > Still I agree it is worth looking at exists() as you did  ... and
>>> >     > consider providing a fast simplified version of it in addition to
>>> current
>>> >     > exists() [I think].
>>> >
>>> >     > BTW, as we talk about enhancements here, maybe consider a further
>>> possibility:
>>> >     > My subjective guess is that probably more than half of exists()
>>> uses are of the
>>> >     > form
>>> >
>>> >     > if(exists(name, where, .......)) {
>>> >     >    get(name, whare, ....)
>>> >     >    ..
>>> >     > } else {
>>> >     >     NULL / error() / .. or similar
>>> >     > }
>>> >
>>> >     > i.e. many exists() calls when returning TRUE are immediately
>>> followed by the
>>> >     > corresponding get() call which repeats quite a bit of the lookup
>>> that exists()
>>> >     > has done.
>>> >
>>> >     > Instead, I'd imagine a function, say  getifexists(name, ...) that
>>> does both at
>>> >     > once in the "exists is TRUE" case but in a way we can easily keep
>>> the if(.) ..
>>> >     > else clause above.  One already existing approach would use
>>> >
>>> >     > if(!inherits(tryCatch(xx <- get(name, where, ...),
>>> error=function(e)e), "error")) {
>>> >
>>> >     >   ... (( work with xx )) ...
>>> >
>>> >     > } else  {
>>> >     >    NULL / error() / .. or similar
>>> >     > }
>>> >
>>> >     > but of course our C implementation would be more efficient and
>>> use
>>> more concise
>>> >     > syntax {which should not look like error handling}.   Follow ups
>>> to this idea
>>> >     > should really go to R-devel (the mailing list).
>>> >
>>> > and now I do follow up here myself :
>>> >
>>> > I found that  'getifexists()' is actually very simple to implement,
>>> > I have already tested it a bit, but not yet committed to R-devel
>>> > (the "R trunk" aka "master branch") because I'd like to get
>>> > public comments {RFC := Request For Comments}.
>>> >
>>>
>>> I don't like the name -- I'd prefer getIfExists.  As Baath (2012, R
>>> Journal) pointed out, R names are very inconsistent in naming
>>> conventions, but lowerCamelCase is the most common choice.  Second most
>>> common is period.separated, so an argument could be made for
>>> get.if.exists, but there's still the possibility of confusion with S3
>>> methods, and users of other languages where "." is an operator find it a
>>> little strange.
>>>
>>> If you don't like lowerCamelCase (and a lot of people don't), then I
>>> think underscore_separated is the next best choice, so would use
>>> get_if_exists.
>>>
>>> Another possibility is to make no new name at all, and just add an
>>> optional parameter to get() (which if present acts as your value.if.not
>>> parameter, if not present keeps the current "object not found" error).
>>>
>>> Duncan Murdoch
>>>
>>>
>>> > My version of the help file {for both exists() and getifexists()}
>>> > rendered in text is
>>> >
>>> > ---------------------- help(getifexists) ------------------------------
>>> -
>>> > Is an Object Defined?
>>> >
>>> > Description:
>>> >
>>> >      Look for an R object of the given name and possibly return it
>>> >
>>> > Usage:
>>> >
>>> >      exists(x, where = -1, envir = , frame, mode = "any",
>>> >             inherits = TRUE)
>>> >
>>> >      getifexists(x, where = -1, envir = as.environment(where),
>>> >                  mode = "any", inherits = TRUE, value.if.not = NULL)
>>> >
>>> > Arguments:
>>> >
>>> >        x: a variable name (given as a character string).
>>> >
>>> >    where: where to look for the object (see the details section); if
>>> >           omitted, the function will search as if the name of the
>>> >           object appeared unquoted in an expression.
>>> >
>>> >    envir: an alternative way to specify an environment to look in, but
>>> >           it is usually simpler to just use the 'where' argument.
>>> >
>>> >    frame: a frame in the calling list.  Equivalent to giving 'where' as
>>> >           'sys.frame(frame)'.
>>> >
>>> >     mode: the mode or type of object sought: see the 'Details' section.
>>> >
>>> > inherits: should the enclosing frames of the environment be searched?
>>> >
>>> > value.if.not: the return value of 'getifexists(x, *)' when 'x' does not
>>> >           exist.
>>> >
>>> > Details:
>>> >
>>> >      The 'where' argument can specify the environment in which to look
>>> >      for the object in any of several ways: as an integer (the position
>>> >      in the 'search' list); as the character string name of an element
>>> >      in the search list; or as an 'environment' (including using
>>> >      'sys.frame' to access the currently active function calls).  The
>>> >      'envir' argument is an alternative way to specify an environment,
>>> >      but is primarily there for back compatibility.
>>> >
>>> >      This function looks to see if the name 'x' has a value bound to it
>>> >      in the specified environment.  If 'inherits' is 'TRUE' and a value
>>> >      is not found for 'x' in the specified environment, the enclosing
>>> >      frames of the environment are searched until the name 'x' is
>>> >      encountered.  See 'environment' and the 'R Language Definition'
>>> >      manual for details about the structure of environments and their
>>> >      enclosures.
>>> >
>>> >      *Warning:* 'inherits = TRUE' is the default behaviour for R but
>>> >      not for S.
>>> >
>>> >      If 'mode' is specified then only objects of that type are sought.
>>> >      The 'mode' may specify one of the collections '"numeric"' and
>>> >      '"function"' (see 'mode'): any member of the collection will
>>> >      suffice.  (This is true even if a member of a collection is
>>> >      specified, so for example 'mode = "special"' will seek any type of
>>> >      function.)
>>> >
>>> > Value:
>>> >
>>> >      'exists():' Logical, true if and only if an object of the correct
>>> >      name and mode is found.
>>> >
>>> >      'getifexists():' The object-as from 'get(x, *)'- if 'exists(x, *)'
>>> >      is true, otherwise 'value.if.not'.
>>> >
>>> > Note:
>>> >
>>> >    With 'getifexists()', instead of the easy to read but somewhat
>>> >    inefficient
>>> >
>>> >        if (exists(myVarName, envir = myEnvir)) {
>>> >          r <- get(myVarName, envir = myEnvir)
>>> >          ## ... deal with r ...
>>> >        }
>>> >
>>> >    you now can use the more efficient (and slightly harder to read)
>>> >
>>> >        if (!is.null(r <- getifexists(myVarName, envir = myEnvir))) {
>>> >          ## ... deal with r ...
>>> >        }
>>> >
>>> > References:
>>> >
>>> >      Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S
>>> >      Language_.  Wadsworth & Brooks/Cole.
>>> >
>>> > See Also:
>>> >
>>> >      'get'.  For quite a different kind of "existence" checking, namely
>>> >      if function arguments were specified, 'missing'; and for yet a
>>> >      different kind, namely if a file exists, 'file.exists'.
>>> >
>>> > Examples:
>>> >
>>> >      ##  Define a substitute function if necessary:
>>> >      if(!exists("some.fun", mode = "function"))
>>> >        some.fun <- function(x) { cat("some.fun(x)\n"); x }
>>> >      search()
>>> >      exists("ls", 2) # true even though ls is in pos = 3
>>> >      exists("ls", 2, inherits = FALSE) # false
>>> >
>>> >      ## These are true (in most circumstances):
>>> >      identical(ls,   getifexists("ls"))
>>> >      identical(NULL, getifexists(".foo.bar.")) # default value.if.not =
>>> NULL(!)
>>> >
>>> > ----------------- end[ help(getifexists) ]
>>> -----------------------------
>>> >
>>> > ______________________________________________
>>> > R-devel at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list