[Rd] na.omit inconsistent with is.na on list

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Fri Aug 13 08:46:10 CEST 2021


On Thu, Aug 12, 2021 at 4:30 PM Toby Hocking <tdhock5 using gmail.com> wrote:

> Hi Gabe thanks for the feedback.
>
> On Thu, Aug 12, 2021 at 1:19 PM Gabriel Becker <gabembecker using gmail.com>
> wrote:
>
>> Hi Toby,
>>
>> This definitely appears intentional, the first  expression of
>> stats:::na.omit.default is
>>
>>    if (!is.atomic(object))
>>
>>         return(object)
>>
>> Based on this code it does seem that the documentation could be clarified
> to say atomic vectors.
>
>>
>> So it is explicitly just returning the object in non-atomic cases, which
>> includes lists. I was not involved in this decision (obviously) but my
>> guess is that it is due to the fact that what constitutes an observation
>> "being complete" in unclear in the list case. What should
>>
>> na.omit(list(5, NA, c(NA, 5)))
>>
>> return? Just the first element, or the first and the last? It seems, at
>> least to me, unclear.
>>
> I agree in principle/theory that it is unclear, but in practice is.na has
> an un-ambiguous answer (if list element is scalar NA then it is considered
> missing, otherwise not).
>

Well, yes it's unambiguous, but I would argue less likely than the other
option to be correct. Remember what na.omit is supposed to do: "remove
observations which are not complete".

Now for data.frames, this means it removes any row (i.e. observation,
despite the internal structure) where *any* column contains an NA. The most
analogous interpretation of na.omit on a list, in the well behaved (ie list
of atomic vectors) case, I think, is that we consider it a ragged
collection of "observations", in which case  x[is.na(x)] with x a list
would do the wrong thing because it is not checking these "observations"
for completeness.

Perhaps others disagree with me about that, and anyway, this only works
when you can check the elements of the list for "completeness" at all, the
list can have anything for elements, and then checking for completeness
becomes impossible...

As is, I do also wonder if a warning should be thrown letting the user know
that their call isn't doing ANY of the possible things it could mean...

Best,
~G


> A small change to the documentation to to add "atomic (in the sense of
>> is.atomic returning \code{TRUE})" in front of "vectors"  or similar  where
>> what types of objects are supported seems justified, though, imho, as the
>> current documentation is either ambiguous or technically incorrect,
>> depending on what we take "vector" to mean.
>>
>> Best,
>> ~G
>>
>> On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 using gmail.com> wrote:
>>
>>> Also, the na.omit method for data.frame with list column seems to be
>>> inconsistent with is.na,
>>>
>>> > L <- list(NULL, NA, 0)
>>> > str(f <- data.frame(I(L)))
>>> 'data.frame': 3 obs. of  1 variable:
>>>  $ L:List of 3
>>>   ..$ : NULL
>>>   ..$ : logi NA
>>>   ..$ : num 0
>>>   ..- attr(*, "class")= chr "AsIs"
>>> > is.na(f)
>>>          L
>>> [1,] FALSE
>>> [2,]  TRUE
>>> [3,] FALSE
>>> > na.omit(f)
>>>    L
>>> 1
>>> 2 NA
>>> 3  0
>>>
>>> On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 using gmail.com> wrote:
>>>
>>> > na.omit is documented as "na.omit returns the object with incomplete
>>> cases
>>> > removed." and "At present these will handle vectors," so I expected
>>> that
>>> > when it is used on a list, it should return the same thing as if we
>>> subset
>>> > via is.na; however I observed the following,
>>> >
>>> > > L <- list(NULL, NA, 0)
>>> > > str(L[!is.na(L)])
>>> > List of 2
>>> >  $ : NULL
>>> >  $ : num 0
>>> > > str(na.omit(L))
>>> > List of 3
>>> >  $ : NULL
>>> >  $ : logi NA
>>> >  $ : num 0
>>> >
>>> > Should na.omit be fixed so that it returns a result that is consistent
>>> > with is.na? I assume that is.na is the canonical definition of what
>>> > should be considered a missing value in R.
>>> >
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list