[Rd] True length - length(unclass(x)) - without having to call unclass()?

Tomas Kalibera tom@@@k@liber@ @ending from gm@il@com
Mon Sep 10 14:18:07 CEST 2018


On 09/05/2018 11:18 AM, Iñaki Ucar wrote:
> The bottomline here is that one can always call a base method,
> inexpensively and without modifying the object, in, let's say,
> *formal* OOP languages. In R, this is not possible in general. It
> would be possible if there was always a foo.default, but primitives
> use internal dispatch.
>
> I was wondering whether it would be possible to provide a super(x, n)
> function which simply causes the dispatching system to avoid "n"
> classes in the hierarchy, so that:
>
>> x <- structure(list(), class=c("foo", "bar"))
>> length(super(x, 0)) # looks for a length.foo
>> length(super(x, 1)) # looks for a length.bar
>> length(super(x, 2)) # calls the default
>> length(super(x, Inf)) # calls the default
I think that a cast should always to be for a specific class, defined by 
the name of the class. Identifying classes by their inheritance index 
might be unnecessarily brittle - it would break if someone introduced a 
new ancestor class. Apart from the syntax - supporting fast casts for S3 
dispatch in the current implementation would be quite a bit of work, 
probably not worth it, also it would probably slow down the internal 
dispatch in primitives. But a partial solution could be implemented at 
some point with ALTREP wrappers when one could without copying create a 
wrapper object with a modified class attribute.

Tomas
> Iñaki
>
> El mié., 5 sept. 2018 a las 10:09, Tomas Kalibera
> (<tomas.kalibera using gmail.com>) escribió:
>> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
>>> Is there a low-level function that returns the length of an object 'x'
>>> - the length that for instance .subset(x) and .subset2(x) see? An
>>> obvious candidate would be to use:
>>>
>>> .length <- function(x) length(unclass(x))
>>>
>>> However, I'm concerned that calling unclass(x) may trigger an
>>> expensive copy internally in some cases.  Is that concern unfounded?
>> Unclass() will always copy when "x" is really a variable, because the
>> value in "x" will be referenced; whether it is prohibitively expensive
>> or not depends only on the workload - if "x" is a very long list and
>> this functions is called often then it could, but at least to me this
>> sounds unlikely. Unless you have a strong reason to believe it is the
>> case I would just use length(unclass(x)).
>>
>> If the copying is really a problem, I would think about why the
>> underlying vector length is needed at R level - whether you really need
>> to know the length without actually having the unclassed vector anyway
>> for something else, so whether you are not paying for the copy anyway.
>> Or, from the other end, if you need to do more without copying, and it
>> is possible without breaking the value semantics, then you might need to
>> switch to C anyway and for a bigger piece of code.
>>
>> If it were still just .length() you needed and it were performance
>> critical, you could just switch to C and call Rf_length. That does not
>> violate the semantics, just indeed it is not elegant as you are
>> switching to C.
>>
>> If you stick to R and can live with the overhead of length(unclass(x))
>> then there is a chance the overhead will decrease as R is optimized
>> internally. This is possible in principle when the runtime knows that
>> the unclassed vector is only needed to compute something that does not
>> modify the vector. The current R cannot optimize this out, but it should
>> be possible with ALTREP at some point (and as Radford mentioned pqR does
>> it differently). Even with such internal optimizations indeed it is
>> often necessary to make guesses about realistic workloads, so if you
>> have a realistic workload where say length(unclass(x)) is critical, you
>> are more than welcome to donate it as benchmark.
>>
>> Obviously, if you use a C version calling Rf_length, after such R
>> optimization your code would be unnecessarily non-elegant, but would
>> still work and probably without overhead, because R can't do much less
>> than Rf_length. In more complicated cases though hand-optimized C code
>> to implement say 2 operations in sequence could be slower than what
>> better optimizing runtime could do by joining the effect of possibly
>> more operations, which is in principle another danger of switching from
>> R to C. But as far as the semantics is followed, there is no other danger.
>>
>> The temptation should be small anyway in this case when Rf_length()
>> would be the simplest, but as I made it more than clear in the previous
>> email, one should never violate the value semantics by temporarily
>> modifying the object (temporarily removing the class attribute or
>> temporarily remove the object bit). Violating semantics causes bugs, if
>> not with the present then with future versions of R (where version may
>> be an svn revision). A concrete recent example: modifying objects in
>> place in violation of the semantics caused a lot of bugs with
>> introduction of unification of constants in the byte-code compiler.
>>
>> Best
>> Tomas
>>
>>> Thxs,
>>>
>>> Henrik
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>



More information about the R-devel mailing list