[R] [FORGED] Q re: logical indexing with is.na

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Mon Mar 11 07:58:14 CET 2019


Hi

Do you want something like this?
> x <- c(1,2,NA, 3, 4, 5, NA, 6,7,8, NA, NA, 9,10)
> y <- c(1,2,NA, NA, 3, 4, 5, 6, NA, 7,8, NA, NA, 9,10)
> identical(x[which(!is.na(x))], y[which(!is.na(y))])
[1] TRUE

If I expect NA and want to extract or compare something, I tend to use which to select only non NA elements.

Cheers
Petr

> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of David Goldsmith
> Sent: Sunday, March 10, 2019 7:16 AM
> Cc: r-help using r-project.org
> Subject: Re: [R] [FORGED] Q re: logical indexing with is.na
>
> Thanks, all.  I had read about recycling, but I guess I didn't fully appreciate all
> the "weirdness" it might produce. :/
>
> With this explained, I'm going to ask a follow-up, which is only contextually
> related: the impetus for this discovery was checking "corner cases" to
> determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to determine equality of
> two vectors containing NA's.  Between the above result; my related discovery
> that this indexing preserves relative positional info but not absolute positional
> info; and the performance penalty when comparing long vectors that may be
> unequal "early on";  I've concluded that--if it (can be made to) "short circuit"--it
> would probably be better to use an implicit loop.  So that's my Q: will (or can)
> an implicit loop (be made to) "exit early" if a specified condition is met before
> all indices have been checked?
>
> Thanks again!
>
> DLG
>
> On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
> wrote:
>
> > Regarding the mention of logical indexing, under ?Extract I see:
> >
> > For [-indexing only: i, j, ... can be logical vectors, indicating
> > elements/slices to select. Such vectors are recycled if necessary to
> > match the corresponding extent. i, j, ... can also be negative
> > integers, indicating elements/slices to leave out of the selection.
> >
> > On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner using auckland.ac.nz>
> > wrote:
> > >On 3/10/19 2:36 PM, David Goldsmith wrote:
> > >> Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/
> > >> R";
> > >not
> > >> new to statistics (have had grad-level courses and work experience
> > >> in
> > >> statistics) or vectorized programming syntax (have extensive
> > >experience
> > >> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time
> > >ago--of
> > >> experience w/ S-plus).
> > >>
> > >> In exploring the use of is.na in the context of logical indexing,
> > >I've come
> > >> across the following puzzling-to-me result:
> > >>
> > >>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
> > >> [1]  0.3534253 -1.6731597         NA -0.2079209
> > >> [1]  TRUE  TRUE FALSE
> > >> [1]  0.3534253 -1.6731597 -0.2079209
> > >>
> > >> As you can see, y is a four element vector, the third element of
> > >which is
> > >> NA; the next line gives what I would expect--T T F--because the
> > >> first
> > >two
> > >> elements are not NA but the third element is.  The third line is
> > >> what confuses me: why is the result not the two element vector
> > >> consisting
> > >of
> > >> simply the first two elements of the vector (or, if vectorized
> > >indexing in
> > >> R is implemented to return a vector the same length as the logical
> > >index
> > >> vector, which appears to be the case, at least the first two
> > >> elements
> > >and
> > >> then either NA or NaN in the third slot, where the logical indexing
> > >vector
> > >> is FALSE): why does the implementation "go looking" for an element
> > >whose
> > >> index in the "original" vector, 4, is larger than BOTH the largest
> > >index
> > >> specified in the inner-most subsetting index AND the size of the
> > >resulting
> > >> indexing vector?  (Note: at first I didn't even understand why the
> > >result
> > >> wasn't simply
> > >>
> > >> 0.3534253 -1.6731597         NA
> > >>
> > >> but then I realized that the third logical index being FALSE, there
> > >was no
> > >> reason for *any* element to be there; but if there is, due to some
> > >> overriding rule regarding the length of the result relative to the
> > >length
> > >> of the indexer, shouldn't it revert back to *something* that
> > >indicates the
> > >> "FALSE"ness of that indexing element?)
> > >>
> > >> Thanks!
> > >
> > >It happens because R is eco-concious and re-cycles. :-)
> > >
> > >Try:
> > >
> > >ok <- c(TRUE,TRUE,FALSE)
> > >(1:4)[ok]
> > >
> > >In general in R if there is an operation involving two vectors then
> > >the shorter one gets recycled to provide sufficiently many entries to
> > >match those of the longer vector.
> > >
> > >This in the foregoing example the first entry of "ok" gets used
> > >again, to make a length 4 vector to match up with 1:4.  The result is
> > >the same
> > >
> > >as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
> > >
> > >If you did (1:7)[ok] you'd get the same result as that from
> > >(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
> > >recycled 2 and 1/3 times.
> > >
> > >Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
> > >
> > >Note that in the first two instances you get warnings, but in the
> > >third you don't, since 6 is an integer multiple of 3.
> > >
> > >Why aren't there warnings when logical indexing is used?  I guess
> > >because it would be annoying.  Maybe.
> > >
> > >Note that integer indices get recycled too, but the recycling is
> > >limited so as not to produce redundancies.  So
> > >
> > >(1:4)[1:3] just (sensibly) gives
> > >
> > >[1] 1 2 3
> > >
> > >and *not*
> > >
> > >[1] 1 2 3 1
> > >
> > >Perhaps a bit subtle, but it gives what you'd actually *want* rather
> > >than being pedantic about rules with a result that you wouldn't want.
> > >
> > >cheers,
> > >
> > >Rolf Turner
> > >
> > >P.S.  If you do
> > >
> > >y[1:3][!is.na(y[1:3])]
> > >
> > >i.e. if you're careful to match the length of the vector and the that
> > >of the indices, you get what you initially expected.
> > >
> > >R. T.
> > >
> > >P^2.S.  To the younger and wiser heads on this list:  the help on "["
> > >does not mention that the index vectors can be logical.  I couldn't
> > >find anything about logical indexing in the R help files.  Is
> > >something missing here, or am I just not looking in the right place?
> > >
> > >R. T.
> >
> > --
> > Sent from my phone. Please excuse my brevity.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/



More information about the R-help mailing list