[R] [FORGED] Q re: logical indexing with is.na

David Goldsmith eu|erg@u@@r|em@nn @end|ng |rom gm@||@com
Sun Mar 10 07:15:54 CET 2019


Thanks, all.  I had read about recycling, but I guess I didn't fully
appreciate all the "weirdness" it might produce. :/

With this explained, I'm going to ask a follow-up, which is only
contextually related: the impetus for this discovery was checking "corner
cases" to determine if all(x[!is.na(x)]==y[!is.na(y)]) would suffice to
determine equality of two vectors containing NA's.  Between the above
result; my related discovery that this indexing preserves relative
positional info but not absolute positional info; and the performance
penalty when comparing long vectors that may be unequal "early on";  I've
concluded that--if it (can be made to) "short circuit"--it would probably
be better to use an implicit loop.  So that's my Q: will (or can) an
implicit loop (be made to) "exit early" if a specified condition is met
before all indices have been checked?

Thanks again!

DLG

On Sat, Mar 9, 2019 at 9:07 PM Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
wrote:

> Regarding the mention of logical indexing, under ?Extract I see:
>
> For [-indexing only: i, j, ... can be logical vectors, indicating
> elements/slices to select. Such vectors are recycled if necessary to match
> the corresponding extent. i, j, ... can also be negative integers,
> indicating elements/slices to leave out of the selection.
>
> On March 9, 2019 6:57:05 PM PST, Rolf Turner <r.turner using auckland.ac.nz>
> wrote:
> >On 3/10/19 2:36 PM, David Goldsmith wrote:
> >> Hi!  Newbie (self-)learning R using P. Dalgaard's "Intro Stats w/ R";
> >not
> >> new to statistics (have had grad-level courses and work experience in
> >> statistics) or vectorized programming syntax (have extensive
> >experience
> >> with MatLab, Python/NumPy, and IDL, and even a smidgen--a long time
> >ago--of
> >> experience w/ S-plus).
> >>
> >> In exploring the use of is.na in the context of logical indexing,
> >I've come
> >> across the following puzzling-to-me result:
> >>
> >>> y; !is.na(y[1:3]); y[!is.na(y[1:3])]
> >> [1]  0.3534253 -1.6731597         NA -0.2079209
> >> [1]  TRUE  TRUE FALSE
> >> [1]  0.3534253 -1.6731597 -0.2079209
> >>
> >> As you can see, y is a four element vector, the third element of
> >which is
> >> NA; the next line gives what I would expect--T T F--because the first
> >two
> >> elements are not NA but the third element is.  The third line is what
> >> confuses me: why is the result not the two element vector consisting
> >of
> >> simply the first two elements of the vector (or, if vectorized
> >indexing in
> >> R is implemented to return a vector the same length as the logical
> >index
> >> vector, which appears to be the case, at least the first two elements
> >and
> >> then either NA or NaN in the third slot, where the logical indexing
> >vector
> >> is FALSE): why does the implementation "go looking" for an element
> >whose
> >> index in the "original" vector, 4, is larger than BOTH the largest
> >index
> >> specified in the inner-most subsetting index AND the size of the
> >resulting
> >> indexing vector?  (Note: at first I didn't even understand why the
> >result
> >> wasn't simply
> >>
> >> 0.3534253 -1.6731597         NA
> >>
> >> but then I realized that the third logical index being FALSE, there
> >was no
> >> reason for *any* element to be there; but if there is, due to some
> >> overriding rule regarding the length of the result relative to the
> >length
> >> of the indexer, shouldn't it revert back to *something* that
> >indicates the
> >> "FALSE"ness of that indexing element?)
> >>
> >> Thanks!
> >
> >It happens because R is eco-concious and re-cycles. :-)
> >
> >Try:
> >
> >ok <- c(TRUE,TRUE,FALSE)
> >(1:4)[ok]
> >
> >In general in R if there is an operation involving two vectors then
> >the shorter one gets recycled to provide sufficiently many entries to
> >match those of the longer vector.
> >
> >This in the foregoing example the first entry of "ok" gets used again,
> >to make a length 4 vector to match up with 1:4.  The result is the same
> >
> >as (1:4)[c(TRUE,TRUE,FALSE,TRUE)].
> >
> >If you did (1:7)[ok] you'd get the same result as that from
> >(1:7)[c(TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)] i.e. "ok" gets
> >recycled 2 and 1/3 times.
> >
> >Try 10*(1:3) + 1:4, 10*(1:3) + 1:5, 10*(1:3) + 1:6 .
> >
> >Note that in the first two instances you get warnings, but in the third
> >you don't, since 6 is an integer multiple of 3.
> >
> >Why aren't there warnings when logical indexing is used?  I guess
> >because it would be annoying.  Maybe.
> >
> >Note that integer indices get recycled too, but the recycling is
> >limited
> >so as not to produce redundancies.  So
> >
> >(1:4)[1:3] just (sensibly) gives
> >
> >[1] 1 2 3
> >
> >and *not*
> >
> >[1] 1 2 3 1
> >
> >Perhaps a bit subtle, but it gives what you'd actually *want* rather
> >than being pedantic about rules with a result that you wouldn't want.
> >
> >cheers,
> >
> >Rolf Turner
> >
> >P.S.  If you do
> >
> >y[1:3][!is.na(y[1:3])]
> >
> >i.e. if you're careful to match the length of the vector and the that
> >of
> >the indices, you get what you initially expected.
> >
> >R. T.
> >
> >P^2.S.  To the younger and wiser heads on this list:  the help on "["
> >does not mention that the index vectors can be logical.  I couldn't
> >find
> >anything about logical indexing in the R help files.  Is something
> >missing here, or am I just not looking in the right place?
> >
> >R. T.
>
> --
> Sent from my phone. Please excuse my brevity.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list