[Rd] arithmetic with zero-column data.frames

Martin Maechler maechler at stat.math.ethz.ch
Wed Aug 9 12:39:26 CEST 2017


So as often there is more to it than you first think.
Let's consider this an RFC (for experienced long time R users) :

>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Wed, 9 Aug 2017 10:45:56 +0200 writes:

>>>>> William Dunlap via R-devel <r-devel at r-project.org>
>>>>>     on Tue, 8 Aug 2017 11:59:45 -0700 writes:

    >> Should arithmetic operations work on zero-column data.frames (returning a
    >> zero-column data.frame with the same number of rows as the data.frame
    >> argument(s))?   Currently we get:

    >>> 1 + data.frame(row.names=c("A","B"))
    >> Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows
    >> = FALSE) :
    >> row names supplied are of the wrong length
    >>> data.frame(row.names=c("A","B")) * 2
    >> Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows
    >> = FALSE) :
    >> row names supplied are of the wrong length
    >>> data.frame(row.names=c("A","B")) / data.frame(row.names=c("A","B"))
    >> Error in data.frame(value, row.names = rn, check.names = FALSE, check.rows
    >> = FALSE) :
    >> row names supplied are of the wrong length

    >> Bill Dunlap
    >> TIBCO Software
    >> wdunlap tibco.com

    > Thank you, Bill.

    > Yes, indeed, as we have the   Ops.data.frame  and
    > Math.data.frame group methods  (about which I have not always
    > been so happy,  but they are inheritance from S),
    > and as the Math methods work too,  we should get this boundary
    > case working as well for the Ops.

Hmm..  This time, I'd be glad for comments, notably from you, Bill:

In looking at this, I notice that "^" is treated
exceptionally, possibly not on purpose, i.e., accidentally. E.g.,
USArrests ^ 2    returns a matrix  where all other arithmetic
Ops give a data frame.

All non-arithmetic Ops do give a matrix [also not documentedly, AFAICS].
and currently "^"  is treated like them.

Note that Math.data.frame always returns a data frame (when it
does return), so we currently have this ugly inconsistency:

> str(USArrests ^ 0.5)
 num [1:50, 1:4] 3.63 3.16 2.85 2.97 3 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:50] "Alabama" "Alaska" "Arizona" "Arkansas" ...
  ..$ : chr [1:4] "Murder" "Assault" "UrbanPop" "Rape"
> str(sqrt(USArrests))
'data.frame':	50 obs. of  4 variables:
 $ Murder  : num  3.63 3.16 2.85 2.97 3 ...
 $ Assault : num  15.4 16.2 17.1 13.8 16.6 ...
 $ UrbanPop: num  7.62 6.93 8.94 7.07 9.54 ...
 $ Rape    : num  4.6 6.67 5.57 4.42 6.37 ...
> 

I propose to add "^" to the other arithmetic ops which return a
data frame.  So in the above,  '^ 0.5' would give the same [upto
lowest bit rounding error] as sqrt().

- -- - -- - --

A further inconsistency is that the Math methods directly refuse
to work on a data frame with non-numeric variables, whereas the
Ops methods just go along and give warnings and NA's:

> sqrt(CO2)
Error in Math.data.frame(CO2) : 
  non-numeric variable in data frame: PlantTypeTreatment

> str( CO2 ^ 0.5 )
 num [1:84, 1:5] NA NA NA NA NA NA NA NA NA NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:84] "1" "2" "3" "4" ...
  ..$ : chr [1:5] "Plant" "Type" "Treatment" "conc" ...
Warning messages:
1: In Ops.ordered(left, right) : '^' is not meaningful for ordered factors
2: In Ops.factor(left, right) : ‘^’ not meaningful for factors
3: In Ops.factor(left, right) : ‘^’ not meaningful for factors
> 

One "clean" radical solution here would be for the  Ops method
to also directly give an error as the Math one.
But that may be undesirable.
Assume people have data frame variables of classes where an Ops method is
defined for it.  Then  the corresponding "op" is applied
everywhere and the result maybe useful and as desired.

So, I'm much less sure what's desireable here.
Should we just document the behavior of this latter inconsistency?




Martin Maechler
ETH Zurich and R Core



More information about the R-devel mailing list