[Rd] [R] Why does R replace all row values with NAs

Hervé Pagès hpages at fredhutch.org
Tue Mar 3 23:26:25 CET 2015



On 03/03/2015 02:17 PM, Gabriel Becker wrote:
> Stephanie,
>
> Actually, it's as.logical that isn't preserving matrix dimensions,
> because it coerces to a logical vector:
>
>  > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
>  > dim(as.logical(x))

It's true, as.logical() doesn't help here but Stephanie is right, %in%
does not preserve the dimensions either:

 > dim(x %in% 1:5)
NULL

That's because match() itself doesn't preserve the dimensions:

 > dim(match(x, 1:5))
NULL

So maybe my fast is.true() should be:

is.true <- function(x)
{
   ans <- as.logical(x) %in% TRUE
   if (is.null(dim(x))) {
     names(ans) <- names(x)
   } else {
     dim(ans) <- dim(x)
     dimnames(ans) <- dimnames(x)
   }
   ans
}

or something like that...

H.

> NULL
>
> ~G
>
> On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten
> <sdmorris at u.washington.edu <mailto:sdmorris at u.washington.edu>> wrote:
>
>
>
>     On 3/3/15 1:26 PM, Hervé Pagès wrote:
>
>
>
>         On 03/03/2015 02:28 AM, Martin Maechler wrote:
>
>             Diverted from R-help :
>             .... as it gets into musing about new R language "primitives"
>
>                                 William Dunlap <wdunlap at tibco.com
>                                 <mailto:wdunlap at tibco.com>>
>                                       on Fri, 27 Feb 2015 08:04:36 -0800
>                                 writes:
>
>
>                   > You could define functions like
>
>                   > is.true <- function(x) !is.na <http://is.na>(x) & x
>                   > is.false <- function(x) !is.na <http://is.na>(x) & !x
>
>                   > and use them in your selections.  E.g.,
>                   >> x <-
>             data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10))
>                   >> x[is.true(x$c >= 6), ]
>                   > a  b  c
>                   > 7   7  8  7
>                   > 10 10 11 10
>
>                   > Bill Dunlap
>                   > TIBCO Software
>                   > wdunlap tibco.com <http://tibco.com>
>
>             Yes; the Matrix package has had these
>
>             is0  <- function(x) !is.na <http://is.na>(x) & x == 0
>             isN0 <- function(x) is.na <http://is.na>(x) | x != 0
>             is1  <- function(x) !is.na <http://is.na>(x) & x   # also ==
>             "isTRUE componentwise"
>
>
>         Note that using %in% to block propagation of NAs is about 2x faster:
>
>           > x <- sample(c(NA_integer_, 1:10000), 500000, replace=TRUE)
>           > microbenchmark(as.logical(x) %in% TRUE, !is.na
>         <http://is.na>(x) & x)
>         Unit: milliseconds
>                               expr       min        lq      mean
>           median        uq
>            as.logical(x) %in% TRUE  6.034744  6.264382  6.999083
>         6.29488  6.346028
>                      !is.na <http://is.na>(x) & x 11.202808 11.402437
>         11.469101 11.44848 11.517576
>                 max neval
>         40.36472 100 <tel:40.36472%20%20%20100>
>            11.90916   100
>
>
>     Unfortunately %in% does not preserve matrix dimensions:
>
>      > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE),
>     nrow=50)
>      > dim(x)
>     [1] 50 10
>      > dim(!is.na <http://is.na>(x) & x)
>     [1] 50 10
>      > dim(as.logical(x) %in% TRUE)
>     NULL
>
>     Stephanie
>
>
>
>
>
>
>             namespace hidden for a while  [note the comment of the last
>             one!]
>             and using them for readibility in its own code.
>
>             Maybe we should (again) consider providing some versions of
>             these with R ?
>
>             The Matrix package also has had fast
>
>             allFalse <- all0 <- function(x) .Call(R_all0, x)
>             anyFalse <- any0 <- function(x) .Call(R_any0, x)
>             ##
>             ## anyFalse <- function(x) isTRUE(any(!x))         ## ~= any0
>             ## any0 <- function(x) isTRUE(any(x == 0))          ## ~=
>             anyFalse
>
>             namespace hidden as well, already, which probably could also be
>             brought to base R.
>
>             One big reason to *not* go there (to internal C code) at all
>             with R is
>             that
>             S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group
>             generics)
>             and 'is.na <http://is.na>() have been known and package
>             writers have
>             programmed methods for these.
>             To ensure that S3 and S4 dispatch works "correctly" also inside
>             such new internals is much less easily achieved, and so
>             such a C-based internal function  is0() would no longer be
>             equivalent with    !is.na <http://is.na>(x) & x == 0
>             as soon as 'x' is an "object" with a '==', 'Compare' and/or
>             an is.na <http://is.na>()
>             method.
>
>
>         Excellent point. Thank you! It really makes a big difference for
>         developers who maintain a complex hierarchy of S4 classes and
>         methods,
>         when functions like is.true, anyFalse, etc..., which can be
>         expressed in
>         terms of more basic operations like ==, !=, !, is.na
>         <http://is.na>, etc..., just work
>         out-of-the-box on objects for which these basic operations are
>         defined.
>
>         There is conceptually a small set of "building blocks", at least for
>         objects with a vector-like or list-like semantic, that can be used
>         to formally describe the semantic of many functions in base R. This
>         is what the man page for anyNA does by saying:
>
>             anyNA implements any(is.na <http://is.na>(x))
>
>         even though the actual implementation differs, but that's ok, as
>         long
>         as anyNA is equivalent to doing any(is.na <http://is.na>(x)) on
>         any object for which
>         building block is.na <http://is.na>() is implemented.
>
>         Unfortunately there is no clearly identified set of building blocks
>         in base R. For example, if I want the comparison operations to work
>         on my object, I need to implement ==, >, <, !=, <=, and >= (the
>         'Compare' group generics) even though it should be enough to
>         implement
>         == and >=, because all the others can be described in terms of these
>         2 building blocks. unique/duplicated is another example
>         (unique(x) is
>         conceptually x[!duplicated(x)]). And so on...
>
>         Cheers,
>         H.
>
>
>             OTOH, simple R versions such as your  'is.true',  called 'is1'
>             inside Matrix maybe optimizable a bit by the byte compiler (and
>             jit and other such tricks) and still keep the full
>             semantic including correct method dispatch.
>
>             Martin Maechler, ETH Zurich
>
>
>                   > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski <
>                   > dimitri.liakhovitski at gmail.com
>             <mailto:dimitri.liakhovitski at gmail.com>__> wrote:
>
>                   >> Thank you very much, Duncan.
>                   >> All this being said:
>                   >>
>                   >> What would you say is the most elegant and most
>             safe way to
>             solve such
>                   >> a seemingly simple task?
>                   >>
>                   >> Thank you!
>                   >>
>                   >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
>                   >> <murdoch.duncan at gmail.com
>             <mailto:murdoch.duncan at gmail.com>> wrote:
>                   >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
>                   >> >> So, Duncan, do I understand you correctly:
>                   >> >>
>                   >> >> When I use x$x<6, R doesn't know if it's TRUE or
>             FALSE, so
>             it returns
>                   >> >> a logical value of NA.
>                   >> >
>                   >> > Yes, when x$x is NA.  (Though I think you meant x$c.)
>                   >> >
>                   >> >> When this logical value is applied to a row, the
>             R says:
>             hell, I don't
>                   >> >> know if I should keep it or not, so, just in
>             case, I am
>             going to keep
>                   >> >> it, but I'll replace all the values in this row
>             with NAs?
>                   >> >
>                   >> > Yes.  Indexing with a logical NA is probably a
>             mistake, and
>             this is one
>                   >> > way to signal it without actually triggering a
>             warning or
>             error.
>                   >> >
>                   >> > BTW, I should have mentioned that the example
>             where you
>             indexed using
>                   >> > -which(x$c>=6) is a bad idea:  if none of the
>             entries were 6
>             or more,
>                   >> > this would be indexing with an empty vector, and
>             you'd get
>             nothing, not
>                   >> > everything.
>                   >> >
>                   >> > Duncan Murdoch
>                   >> >
>                   >> >
>                   >> >>
>                   >> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch
>                   >> >> <murdoch.duncan at gmail.com
>             <mailto:murdoch.duncan at gmail.com>> wrote:
>                   >> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:
>                   >> >>>> I know how to get the output I need, but I
>             would benefit
>             from an
>                   >> >>>> explanation why R behaves the way it does.
>                   >> >>>>
>                   >> >>>> # I have a data frame x:
>                   >> >>>> x =
>             data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10))
>                   >> >>>> x
>                   >> >>>> # I want to toss rows in x that contain values
>              >=6. But I
>             don't want
>                   >> >>>> to toss my NAs there.
>                   >> >>>>
>                   >> >>>> subset(x,c<6) # Works correctly, but removes
>             NAs in c,
>             understand why
>                   >> >>>> x[which(x$c<6),] # Works correctly, but
>             removes NAs in c,
>             understand
>                   >> why
>                   >> >>>> x[-which(x$c>=6),] # output I need
>                   >> >>>>
>                   >> >>>> # Here is my question: why does the following line
>             replace the values
>                   >> >>>> of all rows that contain an NA # in x$c with NAs?
>                   >> >>>>
>                   >> >>>> x[x$c<6,]  # Leaves rows with c=NA, but makes
>             the whole
>             row an NA.
>                   >> Why???
>                   >> >>>> x[(x$c<6) | is.na <http://is.na>(x$c),] #
>             output I need - I have to be
>                   >> super-explicit
>                   >> >>>>
>                   >> >>>> Thank you very much!
>                   >> >>>
>                   >> >>> Most of your examples (except the ones using
>             which()) are
>             doing logical
>                   >> >>> indexing.  In logical indexing, TRUE keeps a
>             line, FALSE
>             drops the
>                   >> line,
>                   >> >>> and NA returns NA.  Since "x$c < 6" is NA if
>             x$c is NA,
>             you get the
>                   >> >>> third kind of indexing.
>                   >> >>>
>                   >> >>> Your last example works because in the cases
>             where x$c is
>             NA, it
>                   >> >>> evaluates NA | TRUE, and that evaluates to
>             TRUE.  In the
>             cases where
>                   >> x$c
>                   >> >>> is not NA, you get x$c < 6 | FALSE, and that's
>             the same as
>             x$c < 6,
>                   >> >>> which will be either TRUE or FALSE.
>                   >> >>>
>                   >> >>> Duncan Murdoch
>                   >> >>>
>                   >> >>
>                   >> >>
>                   >> >>
>                   >> >
>                   >>
>                   >>
>                   >>
>                   >> --
>                   >> Dimitri Liakhovitski
>                   >>
>                   >> ________________________________________________
>                   >> R-help at r-project.org <mailto:R-help at r-project.org>
>             mailing list -- To UNSUBSCRIBE and more, see
>                   >> https://stat.ethz.ch/mailman/__listinfo/r-help
>             <https://stat.ethz.ch/mailman/listinfo/r-help>
>                   >> PLEASE do read the posting guide
>                   >> http://www.R-project.org/__posting-guide.html
>             <http://www.R-project.org/posting-guide.html>
>                   >> and provide commented, minimal, self-contained,
>             reproducible
>             code.
>                   >>
>
>                   > [[alternative HTML version deleted]]
>
>                   > ________________________________________________
>                   > R-help at r-project.org <mailto:R-help at r-project.org>
>             mailing list -- To UNSUBSCRIBE and more, see
>                   > https://stat.ethz.ch/mailman/__listinfo/r-help
>             <https://stat.ethz.ch/mailman/listinfo/r-help>
>                   > PLEASE do read the posting guide
>             http://www.R-project.org/__posting-guide.html
>             <http://www.R-project.org/posting-guide.html>
>                   > and provide commented, minimal, self-contained,
>             reproducible code.
>
>             ________________________________________________
>             R-devel at r-project.org <mailto:R-devel at r-project.org> mailing
>             list
>             https://stat.ethz.ch/mailman/__listinfo/r-devel
>             <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
>     ________________________________________________
>     R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/__listinfo/r-devel
>     <https://stat.ethz.ch/mailman/listinfo/r-devel>
>
>
>
>
> --
> Gabriel Becker, PhD
> Computational Biologist
> Bioinformatics and Computational Biology
> Genentech, Inc.

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list