[Rd] [External] Re: hashtab address arg

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Wed Dec 22 16:11:49 CET 2021


On Wed, 22 Dec 2021, Ivan Krylov wrote:

> On Sat, 18 Dec 2021 11:50:54 +0100
> Arnaud FELD <arnaud.feldmann using gmail.com> wrote:
>
>> However, I'm a bit troubled about the "address" argument. What is it
>> intended for since (as far as I know) "address equality" is until now
>> something that isn't really let for the user to decide within R.
>
> Using the words from "Extending R" by John M. Chambers, the concept of
> address identity could be related to the question:
>
>>> If some of the data in the object has changed, is this still the
>>> same object?
>
> Most objects in R are defined by their content. If you had a 100x100
> matrix and changed an element at [50,50], it's now a different matrix,
> even if it's stored in the same variable. If you create another 100x100
> matrix in a different variable but fill it with the same numbers, it
> should still compare equal to your original matrix.
>
> Not all types of R objects are like that. Environments are good
> candidates for pointer equality comparison. For example, the contents
> of the global environment change every time you assign some variable in
> the R command line, but it remains the same global environment. Indeed,
> identical() for environments just compares their pointers: even if two
> different environments only contain objects that compare equal, they
> cannot be considered the same environment, because different closures
> might be referring to them. Similar are data.tables: if you had a giant
> dataset and, as part of cleaning it up, removed some outliers, perhaps
> it should be considered the same dataset, even if the contents aren't
> strictly the same any more. Same goes for reference class and R6
> objects: unlike the pass-by-value semantics associated with most
> objects in R, these are assumed to carry global state within them, and
> modifications to them are reflected everywhere they are referenced, not
> limited to the current function call.

This is still experimental and the 'address' option may not survive at
the R level. There are some C level applications where it can be
useful; maybe it will only be retained there.

> I *think* that most (if not all) objects with reference semantics
> already use pointer comparison when being compared by identical(), so
> the default of "identical" is, as the help page says, almost always the
> right choice, but if it matters to your code whether the objects are
> actually stored in the same area in the memory, use hashes of type
> "address".

Unfortunately not all: External pointer objects are reference objects
but by default are not compared based on object address. Fixing the
default is not an option in the short term as it breaks too much code
(mostly through dependencies on a few packages).

> (Perhaps this topic could be a better fit for R-help.)

R-devel is the right place for this.

Best,

luke

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list