[Rd] hashtab address arg

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Wed Dec 22 14:55:09 CET 2021


On Sat, 18 Dec 2021 11:50:54 +0100
Arnaud FELD <arnaud.feldmann using gmail.com> wrote:

> However, I'm a bit troubled about the "address" argument. What is it
> intended for since (as far as I know) "address equality" is until now
> something that isn't really let for the user to decide within R.

Using the words from "Extending R" by John M. Chambers, the concept of
address identity could be related to the question:

>> If some of the data in the object has changed, is this still the
>> same object?

Most objects in R are defined by their content. If you had a 100x100
matrix and changed an element at [50,50], it's now a different matrix,
even if it's stored in the same variable. If you create another 100x100
matrix in a different variable but fill it with the same numbers, it
should still compare equal to your original matrix.

Not all types of R objects are like that. Environments are good
candidates for pointer equality comparison. For example, the contents
of the global environment change every time you assign some variable in
the R command line, but it remains the same global environment. Indeed,
identical() for environments just compares their pointers: even if two
different environments only contain objects that compare equal, they
cannot be considered the same environment, because different closures
might be referring to them. Similar are data.tables: if you had a giant
dataset and, as part of cleaning it up, removed some outliers, perhaps
it should be considered the same dataset, even if the contents aren't
strictly the same any more. Same goes for reference class and R6
objects: unlike the pass-by-value semantics associated with most
objects in R, these are assumed to carry global state within them, and
modifications to them are reflected everywhere they are referenced, not
limited to the current function call.

I *think* that most (if not all) objects with reference semantics
already use pointer comparison when being compared by identical(), so
the default of "identical" is, as the help page says, almost always the
right choice, but if it matters to your code whether the objects are
actually stored in the same area in the memory, use hashes of type
"address".

(Perhaps this topic could be a better fit for R-help.)

-- 
Best regards,
Ivan



More information about the R-devel mailing list