[Rd] R (development) changes in arith, logic, relop with (0-extent) arrays

Fri Sep 9 10:35:04 CEST 2016

>>>>> Radford Neal <radford at cs.toronto.edu>
>>>>>     on Thu, 8 Sep 2016 17:11:18 -0400 writes:

    > Regarding Martin Maechler's proposal:
    > Arithmetic between length-1 arrays and longer non-arrays had
    > silently dropped the array attributes and recycled.  This now gives
    > a warning and will signal an error in the future, as it has always
    > for logic and comparison operations

    > For example, matrix(1,1,1) + (1:2) would give a warning/error.

    > I think this might be a mistake.

    > The potential benefits of this would be detection of some programming
    > errors, and increased consistency.  The downsides are breaking
    > existing working programs, and decreased consistency.

    > Regarding consistency, the overall R philosophy is that attaching an
    > attribute to a vector doesn't change what you can do with it, or what
    > the result is, except that the result (often) gets the attributes
    > carried forward.  By this logic, adding a 'dim' attribute shouldn't
    > stop you from doing arithmetic (or comparisons) that you otherwise
    > could.

Thank you, Radford, for joining in.
The above is a good line of reasoning.

    > But maybe 'dim' attributes are special?  Well, they are in some
    > circumstances, and have to be when they are intended to change the
    > behaviour, such as when a matrix is used as an index with [.

indeed.

    > But in many cases at present, 'dim' attributes DON'T stop you from
    > treating the object as a plain vector - for example, one is allowed 
    > to do matrix(1:4,2,2)[3], and a<-numeric(10); a[2:5]<-matrix(1,2,2).

agreed.

    > So it may make more sense to move towards consistency in the
    > permissive direction, rather than the restrictive direction.

    > That would mean allowing matrix(1,1,1) < (1:2), and maybe also things
    > like matrix(1,2,2)+(1:8).

That is an interesting idea.  Yes, in my view that would
definitely also have to allow the latter, by the above argument
of not treating the dim/dimnames attributes special.  For
non-arrays length-1 is not treated much special apart from the
fact that length-1 can always be recycled (without warning).

    > Obviously, a change that removes error conditions is much less likely
    > to produce backwards-compatibility problems than a change that gives
    > errors for previously-allowed operations.

Of course that is true... and that has also been the reason for
my amendment

    > And I think there would be some significant problems. In addition to
    > the 10-20+ packages that Martin expects to break, there could be quite
    > a bit of user code that would no longer work - scripts for analysing
    > data sets that used to work, but now don't with the latest version.

That's not true (at least for the cases above): They would give
a strong warning, "strong" because it is

   > matrix(1,1) + 1:2
   [1] 2 3
   Warning message:
   In matrix(1, 1) + 1:2 :
     dropping dim() of array of length one.  Will become ERROR
   > 

*and* the  logic and relop versions of this, e.g.,
   matrix(TRUE,1) | c(TRUE,FALSE) ;  matrix(1,1) > 1:2,
have always been an  error; so nothing would break there.

    > There are reasons to expect such problems.  Some operations such as
    > vector dot products using %*% produce results that are always scalar,
    > but are formed as 1x1 matrices.

Of course; that *was* the reason the very special treatment for arithmetic
length-1 arrays had been introduced.  It is convenient.

However, *some* of the conveniences in S (and hence R) functions
have been dangerous {and much more used, hence close to
impossible to abolish, e.g., sample(x) when x  is numeric of length 1,
and several others, you'll find in the "R Inferno"}, or at least
quirky for *programming* with R (as opposed to pure interactive use).

    > One can anticipate that many people
    > have not been getting rid of the 'dim' attribute in such cases, when
    > doing so hasn't been necessary in the past.

If it remains at 10-20 CRAN packages (out of 9000), each with
just very few instances, that would indicate I think not so wide
spread use.
Note that they only did not have to get rid of the dim() in the
length-1 case (and only for arithmetic): as soon as they had
another dimension, they would have got an error.

Still, I agree about the validity of your line of thought, and
that in order to get consistency we also could go into the
direction of being more permissive rather than restrictive.

I'm interested to hear other opinions notably as in recent years,
some famous R teachers have typically critized R are as being
not strict enough ...

    > Regarding the 0-length vector issue, I agree with other posters that
    > after a<-numeric(0), is has to be allowable to write a<1.  To not
    > allow this would be highly destructive of code reliability.  And for
    > similar reason, after a<-c(), a<1 needs to be allowed, which means
    > NULL<1 should be allowed (giving logical(0)), since c() is
    > NULL.

Yes, indeed, treating NULL the same as a length-0 atomic
vector here is also correct in my view, and maybe the fact you
mention that c() is NULL  does help to convince others.

Martin