| findInterval {base} | R Documentation | 
Find Interval Numbers or Indices
Description
Given a vector of non-decreasing breakpoints in vec, find the
interval containing each element of x; i.e., if
i <- findInterval(x,v), for each index j in x
v_{i_j} \le x_j < v_{i_j + 1}
where v_0 := -\infty,
v_{N+1} := +\infty, and N <- length(v).
At the two boundaries, the returned index may differ by 1, depending
on the optional arguments rightmost.closed and all.inside.
Usage
findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE,
             left.open = FALSE, checkSorted = TRUE, checkNA = TRUE)
Arguments
| x | numeric. | 
| vec | numeric, sorted (weakly) increasingly, of length  | 
| rightmost.closed | logical; if true, the rightmost interval,
 | 
| all.inside | logical; if true, the returned indices are coerced
into  | 
| left.open | logical; if true all the intervals are open at left
and closed at right; in the formulas below,  | 
| checkSorted | logical indicating if  | 
| checkNA | logical indicating if each  | 
Details
The function findInterval finds the index of one vector x in
another, vec, where the latter must be non-decreasing.  Where
this is trivial, equivalent to apply( outer(x, vec, `>=`), 1, sum),
as a matter of fact, the internal algorithm uses interval search
ensuring O(n \log N) complexity where
n <- length(x) (and N <- length(vec)).  For (almost)
sorted x, it will be even faster, basically O(n).
This is the same computation as for the empirical distribution
function, and indeed, findInterval(t, sort(X)) is
identical to n F_n(t; X_1,\dots,X_n) where F_n is the empirical distribution
function of X_1,\dots,X_n.
When rightmost.closed = TRUE, the result for x[j] = vec[N]
( = \max vec), is N - 1 as for all other
values in the last interval.
left.open = TRUE is occasionally useful, e.g., for survival data.
For (anti-)symmetry reasons, it is equivalent to using
“mirrored” data, i.e., the following is always true:
    identical(
          findInterval( x,  v,      left.open= TRUE, ...) ,
      N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )
  
where N <- length(vec) as above.
Value
vector of length length(x) with values in 0:N (and
NA) where N <- length(vec), or values coerced to
1:(N-1) if and only if all.inside = TRUE (equivalently coercing all
x values inside the intervals).  Note that NAs are
propagated from x, and Inf values are allowed in
both x and vec.
Author(s)
Martin Maechler
See Also
approx(*, method = "constant") which is a
generalization of findInterval(), ecdf for
computing the empirical distribution function which is (up to a factor
of n) also basically the same as findInterval(.).
Examples
x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)
cbind(x, findInterval(x, v))
N <- 100
X <- sort(round(stats::rt(N, df = 2), 2))
tt <- c(-100, seq(-2, 2, length.out = 201), +100)
it <- findInterval(tt, X)
tt[it < 1 | it >= N] # only first and last are outside range(X)
stopifnot(identical(it, ## suppressing the checks is faster *BUT* dangerous, unless
                    ##     you *know* that X is sorted   and   tt contains no NA's
                    findInterval(tt, X, checkSorted=FALSE, checkNA=FALSE)))
##  'left.open = TRUE' means  "mirroring" :
N <- length(v)
stopifnot(identical(
                  findInterval( x,  v,  left.open=TRUE) ,
              N - findInterval(-x, -v[N:1])))