[Rd] na.omit inconsistent with is.na on list

Toby Hocking tdhock5 @end|ng |rom gm@||@com
Mon Aug 16 19:54:15 CEST 2021


To clarify, ?is.na docs say that 'na.omit' returns the object with
incomplete cases removed.
If we take is.na to be the definition of "incomplete cases" then a list
element with scalar NA is incomplete.
About the data.frame method, in my opinion it is highly
confusing/inconsistent for na.omit to keep rows with incomplete cases in
list columns, but not in columns which are atomic vectors,

> (f.num <- data.frame(num=c(1,NA,2)))
  num
1   1
2  NA
3   2
> is.na(f.num)
       num
[1,] FALSE
[2,]  TRUE
[3,] FALSE
> na.omit(f.num)
  num
1   1
3   2

> (f.list <- data.frame(list=I(list(1,NA,2))))
  list
1    1
2   NA
3    2
> is.na(f.list)
      list
[1,] FALSE
[2,]  TRUE
[3,] FALSE
> na.omit(f.list)
  list
1    1
2   NA
3    2

On Sat, Aug 14, 2021 at 5:15 PM Gabriel Becker <gabembecker using gmail.com>
wrote:

> I understand what is.na does, the issue I have is that its task is not
> equivalent to the conceptual task na.omit is doing, in my opinion, as
> illustrated by what the data.frame method does.
>
> Thus what i was getting at above about it not being clear that lst[is.na(lst)]
> being the correct thing for na.omit to do
>
> ~G
>
> ~G
>
> On Sat, Aug 14, 2021, 1:49 PM Toby Hocking <tdhock5 using gmail.com> wrote:
>
>> Some relevant information from ?is.na: the behavior for lists is
>> documented,
>>
>>      For is.na, elementwise the result is false unless that element
>>      is a length-one atomic vector and the single element of that
>>      vector is regarded as NA or NaN (note that any is.na method
>>      for the class of the element is ignored).
>>
>> Also there are other functions anyNA and is.na<- which are consistent
>> with
>> is.na. That is, anyNA only returns TRUE if the list has an element which
>> is
>> a scalar NA. And is.na<- sets list elements to logical NA to indicate
>> missingness.
>>
>> On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <hugh.parsonage using gmail.com>
>> wrote:
>>
>> > The data.frame method deliberately skips non-atomic columns before
>> > invoking is.na(x) so I think it is fair to assume this behaviour is
>> > intentional and assumed.
>> >
>> > Not so clear to me that there is a sensible answer for list columns.
>> > (List columns seem to collide with the expectation that in each
>> > variable every observation will be of the same type)
>> >
>> > Consider your list L as
>> >
>> > L <- list(NULL, NA, c(NA, NA))
>> >
>> > Seems like every observation could have a claim to be 'missing' here.
>> > Concretely, if a data.frame had a list column representing the lat-lon
>> > of an observation, we might only be able to represent missing values
>> > like c(NA, NA).
>> >
>> > On Fri, 13 Aug 2021 at 17:27, Iñaki Ucar <iucar using fedoraproject.org>
>> wrote:
>> > >
>> > > On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker using gmail.com>
>> > wrote:
>> > > >
>> > > > Hi Toby,
>> > > >
>> > > > This definitely appears intentional, the first  expression of
>> > > > stats:::na.omit.default is
>> > > >
>> > > >    if (!is.atomic(object))
>> > > >
>> > > >         return(object)
>> > >
>> > > I don't follow your point. This only means that the *default* method
>> > > is not intended for non-atomic cases, but it doesn't mean it shouldn't
>> > > exist a method for lists.
>> > >
>> > > > So it is explicitly just returning the object in non-atomic cases,
>> > which
>> > > > includes lists. I was not involved in this decision (obviously) but
>> my
>> > > > guess is that it is due to the fact that what constitutes an
>> > observation
>> > > > "being complete" in unclear in the list case. What should
>> > > >
>> > > > na.omit(list(5, NA, c(NA, 5)))
>> > > >
>> > > > return? Just the first element, or the first and the last? It
>> seems, at
>> > > > least to me, unclear. A small change to the documentation to to add
>> > "atomic
>> > >
>> > > > is.na(list(5, NA, c(NA, 5)))
>> > > [1] FALSE  TRUE FALSE
>> > >
>> > > Following Toby's argument, it's clear to me: the first and the last.
>> > >
>> > > Iñaki
>> > >
>> > > > (in the sense of is.atomic returning \code{TRUE})" in front of
>> > "vectors"
>> > > > or similar  where what types of objects are supported seems
>> justified,
>> > > > though, imho, as the current documentation is either ambiguous or
>> > > > technically incorrect, depending on what we take "vector" to mean.
>> > > >
>> > > > Best,
>> > > > ~G
>> > > >
>> > > > On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 using gmail.com>
>> > wrote:
>> > > >
>> > > > > Also, the na.omit method for data.frame with list column seems to
>> be
>> > > > > inconsistent with is.na,
>> > > > >
>> > > > > > L <- list(NULL, NA, 0)
>> > > > > > str(f <- data.frame(I(L)))
>> > > > > 'data.frame': 3 obs. of  1 variable:
>> > > > >  $ L:List of 3
>> > > > >   ..$ : NULL
>> > > > >   ..$ : logi NA
>> > > > >   ..$ : num 0
>> > > > >   ..- attr(*, "class")= chr "AsIs"
>> > > > > > is.na(f)
>> > > > >          L
>> > > > > [1,] FALSE
>> > > > > [2,]  TRUE
>> > > > > [3,] FALSE
>> > > > > > na.omit(f)
>> > > > >    L
>> > > > > 1
>> > > > > 2 NA
>> > > > > 3  0
>> > > > >
>> > > > > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 using gmail.com>
>> > wrote:
>> > > > >
>> > > > > > na.omit is documented as "na.omit returns the object with
>> > incomplete
>> > > > > cases
>> > > > > > removed." and "At present these will handle vectors," so I
>> > expected that
>> > > > > > when it is used on a list, it should return the same thing as
>> if we
>> > > > > subset
>> > > > > > via is.na; however I observed the following,
>> > > > > >
>> > > > > > > L <- list(NULL, NA, 0)
>> > > > > > > str(L[!is.na(L)])
>> > > > > > List of 2
>> > > > > >  $ : NULL
>> > > > > >  $ : num 0
>> > > > > > > str(na.omit(L))
>> > > > > > List of 3
>> > > > > >  $ : NULL
>> > > > > >  $ : logi NA
>> > > > > >  $ : num 0
>> > > > > >
>> > > > > > Should na.omit be fixed so that it returns a result that is
>> > consistent
>> > > > > > with is.na? I assume that is.na is the canonical definition of
>> > what
>> > > > > > should be considered a missing value in R.
>> > > > > >
>> > > > >
>> > > > >         [[alternative HTML version deleted]]
>> > > > >
>> > > > > ______________________________________________
>> > > > > R-devel using r-project.org mailing list
>> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> > > > >
>> > > >
>> > > >         [[alternative HTML version deleted]]
>> > > >
>> > > > ______________________________________________
>> > > > R-devel using r-project.org mailing list
>> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> > >
>> > >
>> > >
>> > > --
>> > > Iñaki Úcar
>> > >
>> > > ______________________________________________
>> > > R-devel using r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>> > ______________________________________________
>> > R-devel using r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list