[Rd] True length - length(unclass(x)) - without having to call unclass()?

Iñaki Ucar iuc@r @ending from fedor@project@org
Wed Sep 5 11:18:42 CEST 2018


The bottomline here is that one can always call a base method,
inexpensively and without modifying the object, in, let's say,
*formal* OOP languages. In R, this is not possible in general. It
would be possible if there was always a foo.default, but primitives
use internal dispatch.

I was wondering whether it would be possible to provide a super(x, n)
function which simply causes the dispatching system to avoid "n"
classes in the hierarchy, so that:

> x <- structure(list(), class=c("foo", "bar"))
> length(super(x, 0)) # looks for a length.foo
> length(super(x, 1)) # looks for a length.bar
> length(super(x, 2)) # calls the default
> length(super(x, Inf)) # calls the default

Iñaki

El mié., 5 sept. 2018 a las 10:09, Tomas Kalibera
(<tomas.kalibera using gmail.com>) escribió:
>
> On 08/24/2018 07:55 PM, Henrik Bengtsson wrote:
> > Is there a low-level function that returns the length of an object 'x'
> > - the length that for instance .subset(x) and .subset2(x) see? An
> > obvious candidate would be to use:
> >
> > .length <- function(x) length(unclass(x))
> >
> > However, I'm concerned that calling unclass(x) may trigger an
> > expensive copy internally in some cases.  Is that concern unfounded?
> Unclass() will always copy when "x" is really a variable, because the
> value in "x" will be referenced; whether it is prohibitively expensive
> or not depends only on the workload - if "x" is a very long list and
> this functions is called often then it could, but at least to me this
> sounds unlikely. Unless you have a strong reason to believe it is the
> case I would just use length(unclass(x)).
>
> If the copying is really a problem, I would think about why the
> underlying vector length is needed at R level - whether you really need
> to know the length without actually having the unclassed vector anyway
> for something else, so whether you are not paying for the copy anyway.
> Or, from the other end, if you need to do more without copying, and it
> is possible without breaking the value semantics, then you might need to
> switch to C anyway and for a bigger piece of code.
>
> If it were still just .length() you needed and it were performance
> critical, you could just switch to C and call Rf_length. That does not
> violate the semantics, just indeed it is not elegant as you are
> switching to C.
>
> If you stick to R and can live with the overhead of length(unclass(x))
> then there is a chance the overhead will decrease as R is optimized
> internally. This is possible in principle when the runtime knows that
> the unclassed vector is only needed to compute something that does not
> modify the vector. The current R cannot optimize this out, but it should
> be possible with ALTREP at some point (and as Radford mentioned pqR does
> it differently). Even with such internal optimizations indeed it is
> often necessary to make guesses about realistic workloads, so if you
> have a realistic workload where say length(unclass(x)) is critical, you
> are more than welcome to donate it as benchmark.
>
> Obviously, if you use a C version calling Rf_length, after such R
> optimization your code would be unnecessarily non-elegant, but would
> still work and probably without overhead, because R can't do much less
> than Rf_length. In more complicated cases though hand-optimized C code
> to implement say 2 operations in sequence could be slower than what
> better optimizing runtime could do by joining the effect of possibly
> more operations, which is in principle another danger of switching from
> R to C. But as far as the semantics is followed, there is no other danger.
>
> The temptation should be small anyway in this case when Rf_length()
> would be the simplest, but as I made it more than clear in the previous
> email, one should never violate the value semantics by temporarily
> modifying the object (temporarily removing the class attribute or
> temporarily remove the object bit). Violating semantics causes bugs, if
> not with the present then with future versions of R (where version may
> be an svn revision). A concrete recent example: modifying objects in
> place in violation of the semantics caused a lot of bugs with
> introduction of unification of constants in the byte-code compiler.
>
> Best
> Tomas
>
> >
> > Thxs,
> >
> > Henrik
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Iñaki Ucar



More information about the R-devel mailing list