[Rd] '==' operator: inconsistency in data.frame(...) == NULL

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Wed Sep 11 09:56:37 CEST 2019


>>>>> Hilmar Berger 
>>>>>     on Wed, 4 Sep 2019 15:25:46 +0200 writes:

    > Dear all,

    > I just stumbled upon some behavior of the == operator which is at least 
    > somewhat inconsistent.

    > R version 3.6.1 (2019-07-05) -- "Action of the Toes"
    > Copyright (C) 2019 The R Foundation for Statistical Computing
    > Platform: x86_64-w64-mingw32/x64 (64-bit)

    >> list(a=1:3, b=LETTERS[1:3]) == NULL
    > logical(0)
    >> matrix(1:6, 2,3) == NULL
    > logical(0)
    >> data.frame(a=1:3, b=LETTERS[1:3]) == NULL # same for == logical(0)
    > Error in matrix(if (is.null(value)) logical() else value, nrow = nr, 
    > dimnames = list(rn,  :
    >   length of 'dimnames' [2] not equal to array extent

    >> data.frame(NULL) == 1
    > <0 x 0 matrix>
    >> data.frame(NULL) == NULL
    > <0 x 0 matrix>
    >> data.frame(NULL) == logical(0)
    > <0 x 0 matrix>

    > I wonder if data.frame(<some non-empty data>) == NULL should also return 
    > a value instead of an error. R help reads:

        > "At least one of |x| and |y| must be an atomic vector, but
	>  if the other is a list R attempts to coerce it to the
	>  type of the atomic vector: this will succeed if the list
	>  is made up of elements of length one that can be coerced
	>  to the correct type.

	>  If the two arguments are atomic vectors of different
	>  types, one is coerced to the type of the other, the
	>  (decreasing) order of precedence being character, complex,
	>  numeric, integer, logical and raw."

    > It is not clear from the help what to expect for NULL or
    > empty atomic vectors. 

Well, strictly speaking an error would be expected for NULL,
as it is *not* an atomic vector, and your main issue

 " data.frame(..) == NULL "

would already be settled by the first half sentence from the
doc, and strictly speaking, even  data.frame(NULL) == NULL
"should" return an error ((Note: I'm not saying it really
 should, but at least the reference does not say it should work at all))

Now,  logical(0)  on the other hand *is* an atomic vector ... 


    > It is also strange that for list()
    > there is no error but for data.frame() with the same data
    > an error is thrown. I can see that there might be reasons
    > to return logical(0) instead of FALSE, but I do not fully
    > understand why there should be differences between
    > e.g. matrix() and data.frame().

Well, a [regular base R] matrix() is atomic  and a data frame is not.

    > Also, It is at least somewhat strange that
    > data.frame(NULL) == NULL and similar expressions return an
    > empty matrix, while comparing a normal filled matrix to
    > NULL returns logical(0).

    > Even if this behavior is expected, the error message shown
    > by data.frame(...) == NULL is not very informative.

I'm not at all sure there's any need for a change here.

I would say the following general thinking should be applied

1. The general rule that '==' should be used only for comparing 
  atomic objects (as it returns an atomic object, a 'logical' with
  corresponding attributes), is really principal
  and using '==' for anything else has never been "the idea".

2. There are (two) "semi-exceptions" to the above:
2a) Sometimes it has been convenient to treat NULL as if it was
     a zero-length atomic object (of "arbitrary" type/mode).
2b) data.frame()s "should typically" behave like matrices in
    many situations, notably when indexed {and that rule is
    violated (on purpose) by tibbles .. ("drop=FALSE" etc, but
    that's another story)} 

So because of these exceptions, you and possibly others may
think  '=='  should "work" with data.frame()s and/or NULL, but
I would not tend to agree.

    > Thanks and best regards,
    > Hilmar

You are welcome!
Martin



More information about the R-devel mailing list