Type: | Package |
Title: | Smoothed Empirical Likelihood |
Version: | 0.0.14 |
Maintainer: | Andreï Victorovitch Kostyrka <andrei.kostyrka@gmail.com> |
Description: | Empirical likelihood methods for asymptotically efficient estimation of models based on conditional or unconditional moment restrictions; see Kitamura, Tripathi & Ahn (2004) <doi:10.1111/j.1468-0262.2004.00550.x> and Owen (2013) <doi:10.1002/cjs.11183>. Kernel-based non-parametric methods for density/regression estimation and numerical routines for empirical likelihood maximisation are implemented in 'Rcpp' for speed. |
License: | EUPL version 1.1 | EUPL version 1.2 [expanded from: EUPL] |
Encoding: | UTF-8 |
URL: | https://github.com/Fifis/smoothemplik |
BugReports: | https://github.com/Fifis/smoothemplik/issues |
Depends: | R (≥ 3.0.0) |
Imports: | parallel, Rcpp, RcppParallel, Rdpack, Matrix, data.table |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0), xml2 |
RdMacros: | Rdpack |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
LinkingTo: | Rcpp, RcppArmadillo, RcppParallel, testthat |
SystemRequirements: | GNU make |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
Packaged: | 2025-07-19 16:59:38 UTC; avk |
Author: | Andreï Victorovitch Kostyrka [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-07-22 10:20:02 UTC |
Density cross-validation
Description
Density cross-validation
Usage
DCV(
x,
bw,
weights = NULL,
same = FALSE,
kernel = "gaussian",
order = 2,
PIT = FALSE,
chunks = 0,
no.dedup = FALSE
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
bw |
Candidate bandwidth values: scalar, vector, or a matrix (with columns corresponding to columns of |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
same |
Logical: use the same bandwidth for all columns of Note: since DCV requires computing the leave-one-out estimator, repeated observations are combined first; the de-duplication is therefore forced in cross-validation. The only situation where de-duplication can be skipped is passing de-duplicated data sets from outside (e.g. inside optimisers). |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
chunks |
Integer: the number of chunks to split the task into (limits
RAM usage but increases overhead). |
no.dedup |
Logical: if TRUE, sets |
Value
A numeric vector of the same length as bw
or nrow(bw)
.
Examples
set.seed(1)
x <- rlnorm(100)
bws <- exp(seq(-2, 1.5, 0.1))
plot(bws, DCV(x, bws), log = "x", bty = "n", main = "Density CV")
Least-squares cross-validation function for the Nadaraya-Watson estimator
Description
Least-squares cross-validation function for the Nadaraya-Watson estimator
Usage
LSCV(
x,
y,
bw,
weights = NULL,
same = FALSE,
degree = 0,
kernel = "gaussian",
order = 2,
PIT = FALSE,
chunks = 0,
robust.iterations = 0,
cores = 1
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
y |
A numeric vector of dependent variable values. |
bw |
Candidate bandwidth values: scalar, vector, or a matrix (with columns corresponding to columns of |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
same |
Logical: use the same bandwidth for all columns of |
degree |
Integer: 0 for locally constant estimator (Nadaraya–Watson), 1 for locally linear (Cleveland's LOESS), 2 for locally quadratic (use with care, less stable, requires larger bandwidths) |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
chunks |
Integer: the number of chunks to split the task into (limits
RAM usage but increases overhead). |
robust.iterations |
The number of robustifying iterations (due to Cleveland, 1979). If greater than 0, |
cores |
Integer: the number of CPU cores to use. High core count = high RAM usage. Note: since LSCV requires zeroing out the diagonals of the weight matrix, repeated observations are combined first; the de-duplication is therefore forced in cross-validation. The only situation where de-duplication can be skipped is passing de-duplicated data sets from outside (e.g. inside optimisers). |
Value
A numeric vector of the same length as bw
or nrow(bw)
.
Examples
set.seed(1) # Creating a data set with many duplicates
n.uniq <- 1000
n <- 4000
inds <- sort(ceiling(runif(n, 0, n.uniq)))
x.uniq <- sort(rnorm(n.uniq))
y.uniq <- 1 + 0.2*x.uniq + 0.3*sin(x.uniq) + rnorm(n.uniq)
x <- x.uniq[inds]
y <- y.uniq[inds]
w <- 1 + runif(n, 0, 2) # Relative importance
data.table::setDTthreads(1) # For measuring pure gains and overhead
RcppParallel::setThreadOptions(numThreads = 1)
bw.grid <- seq(0.1, 1.2, 0.05)
ncores <- if (.Platform$OS.type == "windows") 1 else 2
CV <- LSCV(x, y, bw.grid, weights = w, cores = ncores) # Parallel capabilities
bw.opt <- bw.grid[which.min(CV)]
g <- seq(-3.5, 3.5, 0.05)
yhat <- kernelSmooth(x, y, xout = g, weights = w,
bw = bw.opt, deduplicate.xout = FALSE)
oldpar <- par(mfrow = c(2, 1), mar = c(2, 2, 2, 0)+.1)
plot(bw.grid, CV, bty = "n", xlab = "", ylab = "", main = "Cross-validation")
plot(x.uniq, y.uniq, bty = "n", xlab = "", ylab = "", main = "Optimal fit")
points(g, yhat, pch = 16, col = 2, cex = 0.5)
par(oldpar)
Brent's local minimisation
Description
Brent's local minimisation
Usage
brentMin(
f,
interval,
lower = NA_real_,
upper = NA_real_,
tol = 1e-08,
maxiter = 200L,
trace = 0L
)
Arguments
f |
A function to be minimised on an interval. |
interval |
A length-2 vector containing the end-points of the search interval. |
lower |
Scalar: the lower end point of the search interval. Not necessary if |
upper |
Scalar: the upper end point of the search interval. Not necessary if |
tol |
Small positive scalar: stopping criterion. The search stops when the
distance between the current candidate and the midpoint of the bracket is smaller than
the dynamic threshold |
maxiter |
Positive integer: the maximum number of iterations. |
trace |
Integer: 0, 1, or 2. Amount of tracing information on the optimisation progress
printed. |
Details
This is an adaptation of the implementation by John Burkardt (currently available at [https://people.math.sc.edu/Burkardt/m_src/brent/brent.html](https://people.math.sc.edu/Burkardt/m_src/brent/brent.html)).
This function is similar to local_min
or R_zeroin2
-style logic, but with the
following additions: the number of iterations is tracked, and the algorithm stops when the
standard Brent criterion is met or if the maximum iteration count is reached.
The code stores the approximate final bracket width in estim.prec
, like in [uniroot()].
If the minimiser is pinned to an end point, estim.prec = NA
.
There are no preliminary iterations, unlike [brentZero()].
TODO: add preliminary iterations.
Value
A list with the following elements:
- root
Location of the minimum.
- f.root
Function value at the minimuim location.
- iter
Total iteration count used.
- estim.prec
Estimate of the final bracket size.
Examples
f <- function (x) (x - 1/3)^2
brentMin(f, c(0, 1), tol = 0.0001)
brentMin(function(x) x^2*(x-1), lower = 0, upper = 10, trace = 1)
Brent's local root search
Description
Brent's local root search
Usage
brentZero(
f,
interval,
lower = NA_real_,
upper = NA_real_,
f_lower = NULL,
f_upper = NULL,
extendInt = "no",
tol = 1e-08,
maxiter = 500L,
trace = 0L
)
Arguments
f |
The function for which the root is sought. |
interval |
A length-2 vector containing the end-points of the search interval |
lower |
Scalar: the lower end point of the search interval. Not necessary if |
upper |
Scalar: the upper end point of the search interval. Not necessary if |
f_lower |
Scalar: same as f(upper). Passing this value saves time if f(lower) is slow to compute and is known. |
f_upper |
Scalar: same as f(lower). |
extendInt |
Character:
This behavior mirrors that of [uniroot()]. |
tol |
Small positive scalar: convergence tolerance. The search stops when the bracket size is smaller than
|
maxiter |
Positive integer: the maximum number of iterations before stopping. |
trace |
Integer: 0, 1, or 2. Controls the verbosity of the output.
|
Value
A list with the following elements:
- root
Location of the root.
- f.root
Function value at the root.
- iter
Total iteration count used.
- init.it
Number of initial
extendInt
iterations if there were any; NA otherwise.- estim.prec
Estimate of the final bracket size.
Examples
f <- function (x, a) x - a
str(uniroot(f, c(0, 1), tol = 0.0001, a = 1/3))
uniroot(function(x) cos(x) - x, lower = -pi, upper = pi, tol = 1e-9)$root
# This function is faster than the base R uniroot, and this is the primary
# reason why it was written in C++
system.time(replicate(1000, { shift <- runif(1, 0, 2*pi)
uniroot(function(x) cos(x+shift) - x, lower = -pi, upper = pi)
}))
system.time(replicate(1000, { shift <- runif(1, 0, 2*pi)
brentZero(function(x) cos(x+shift) - x, lower = -pi, upper = pi)
}))
# Roughly twice as fast
Bandwidth Selectors for Kernel Density Estimation
Description
Finds the optimal bandwidth by minimising the density cross-valication or least-squares criteria.
Remember that since usually, the CV function is highly non-linear, the return value should be taken with a grain of salt.
With non-smooth kernels (such as uniform), it will oftern return the local minimum after starting from a reasonable value.
The user might want to standardise the input matrix x
by column (divide by some estimator of scale, like sd
or IQR
) and examine the behaviour of the CV criterion as a function of unique bandwidth (same
argument).
If it seems that the optimum is unique, then they may proceed by multiplying the bandwidth by the scale measure,
and start the search for the optimal bandwidth in multiple dimensions.
Usage
bw.CV(
x,
y = NULL,
weights = NULL,
kernel = "gaussian",
order = 2,
PIT = FALSE,
chunks = 0,
robust.iterations = 0,
degree = 0,
start.bw = NULL,
same = FALSE,
tol = 1e-04,
try.grid = TRUE,
ndeps = 1e-05,
verbose = FALSE,
attach.attributes = FALSE,
control = list(),
...
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
y |
A numeric vector of responses (dependent variable) if the user wants least-squares cross-validation. |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
chunks |
Integer: the number of chunks to split the task into (limits
RAM usage but increases overhead). |
robust.iterations |
Passed to |
degree |
Passed to |
start.bw |
Numeric vector: initial value for bandwidth search. |
same |
Logical: use the same bandwidth for all columns of |
tol |
Relative tolerance used by the optimiser as the stopping criterion. |
try.grid |
Logical: if true, 10 different bandwidths around the rule-of-thumb
one are tried with multiplier |
ndeps |
Numerical-difference epsilon. Puts a lower bound on the result: the estimated optimal bw cannot be less than this value. |
verbose |
Logical: print out the optimiser return code for diagnostics? |
attach.attributes |
Logical: if TRUE, returns the output of 'optim()' for diagnostics. |
control |
List: extra arguments to pass to the control-argument list of 'optim'. |
... |
Other parameters passed to the optimiser (e.g. 'lower' for '"L-BFGS-B"'). |
Details
If y
is NULL and only x
is supplied, returns the density-cross-validated bandwidth (DCV).
If y
is supplied, then, returns the least-squares-cross-validated bandwidth (LSCV).
Value
Numeric vector or scalar of the optimal bandwidth.
Examples
set.seed(1) # Creating a data set with many duplicates
n.uniq <- 200
n <- 500
inds <- sort(ceiling(runif(n, 0, n.uniq)))
x.uniq <- sort(rnorm(n.uniq))
y.uniq <- 1 + 0.1*x.uniq + sin(x.uniq) + rnorm(n.uniq)
x <- x.uniq[inds]
y <- y.uniq[inds]
w <- 1 + runif(n, 0, 2) # Relative importance
data.table::setDTthreads(1) # For measuring the pure gains and overhead
RcppParallel::setThreadOptions(numThreads = 1)
bw.grid <- seq(0.1, 1.3, 0.2)
CV <- LSCV(x, y, bw.grid, weights = w)
bw.init <- bw.grid[which.min(CV)]
bw.opt <- bw.CV(x, y, w) # 0.49, very close
g <- seq(-3.5, 3.5, 0.05)
yhat <- kernelSmooth(x, y, g, w, bw.opt, deduplicate.xout = FALSE)
oldpar <- par(mfrow = c(2, 1), mar = c(2, 2, 2, 0)+.1)
plot(bw.grid, CV, bty = "n", xlab = "", ylab = "", main = "Cross-validation")
points(bw.opt, LSCV(x, y, bw.opt, w), col = 2, pch = 15)
plot(x.uniq, y.uniq, bty = "n", xlab = "", ylab = "", main = "Optimal fit")
points(g, yhat, pch = 16, col = 2, cex = 0.5)
par(oldpar)
Silverman's rule-of-thumb bandwidth
Description
A fail-safe function that would return a nice Silverman-like bandwidth suggestion for data for which the standard deviation might be NA or 0.
Usage
bw.rot(
x,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
na.rm = FALSE,
robust = TRUE,
discontinuous = FALSE
)
Arguments
x |
A numeric vector without non-finite values. |
kernel |
A string character: |
na.rm |
Logical: should missing values be removed? Setting it to TRUE may cause issues because variable-wise removal of NAs may return a bandwidth that is inappropriate for the final data set for which it is suggested. |
robust |
Logical: safeguard against extreme observations? If TRUE, uses
|
discontinuous |
Logical: if the true density is discontinuous (i.e. has jumps), then, the formula for the optimal bandwidth for density estimation changes. |
Details
\Sigma = \mathrm{\mathop{diag}}(\sigma^2_k)
with
\det\Sigma = \prod_k \sigma^2_k
and
\Sigma^{-1} = \mathrm{\mathop{diag}}(1/\sigma^{2}_k)
).
Then, the formula 4.12 in Silverman (1986) depends only on \alpha
, \beta
.
\alpha = \mathrm{\mathop{diag}}(\sigma^2_k)
(which depend only on the kernel and are fixed for a multivariate normal), and on the L2-norm of the
second derivative of the density. The (i, i)th element of the Hessian of multi-variate normal
(\phi(x_1, \ldots, x_d) = \phi(X)
) is
\phi(X)(x_i^2 - \sigma^2_i)/\sigma_i^4
.
The rule-of-thumb bandwidth is obtained under the assumption that the true density is multivariate normal with zero covariances (i.e. a diagonal variance-covariance matrix). For details, see (Silverman 1986).
Value
A numeric vector of bandwidths that are a reasonable start optimal non-parametric density estimation of x
.
References
Silverman BW (1986). Density estimation for statistics and data analysis. New York: Chapman and Hall.
Examples
set.seed(1); bw.rot(stats::rnorm(100)) # Should be 0.3787568 in R version 4.0.4
set.seed(1); bw.rot(matrix(stats::rnorm(500), ncol = 10)) # 0.4737872 ... 0.7089850
Compute empirical likelihood on a trajectory
Description
Compute empirical likelihood on a trajectory
Usage
ctracelr(
z,
ct = NULL,
mu0,
mu1,
N = 5,
verbose = FALSE,
verbose.solver = FALSE,
...
)
Arguments
z |
Passed to |
ct |
Passed to |
mu0 |
Starting point of trajectory |
mu1 |
End point of trajectory |
N |
Number of segments into which the path is split (i. e. |
verbose |
Logical: report iteration data? |
verbose.solver |
Logical: report internal iteration data from the optimiser? Very verbose. |
... |
Passed to This function does not accept the starting lambda because it is much faster (3–5 times) to reuse the lambda from the previous iteration. |
Value
A matrix with one row at each mean from mu0 to mu1 and a column for each EL return value (except EL weights).
Examples
# Plot 2.5 from Owen (2001)
earth <- c(
5.5, 5.61, 4.88, 5.07, 5.26, 5.55, 5.36, 5.29, 5.58, 5.65, 5.57, 5.53, 5.62, 5.29,
5.44, 5.34, 5.79, 5.1, 5.27, 5.39, 5.42, 5.47, 5.63, 5.34, 5.46, 5.3, 5.75, 5.68, 5.85
)
weightedEL(earth, mu = 5.1, verbose = TRUE)
logELR <- ctracelr(earth, mu0 = 5.1, mu1 = 5.65, N = 55, verbose = TRUE)
hist(earth, breaks = seq(4.75, 6, 1/8))
plot(logELR[, 1], exp(logELR[, 2]), bty = "n", type = "l",
xlab = "Earth density", ylab = "ELR")
# TODO: why is there non-convergence in row 0?
# Two-dimensional trajectory
set.seed(1)
xy <- matrix(rexp(200), ncol = 2)
logELR2 <- ctracelr(xy, mu0 = c(0.5, 0.5), mu1 = c(1.5, 1.5), N = 100)
Damped Newton optimiser
Description
Damped Newton optimiser
Usage
dampedNewton(
fn,
par,
thresh = 1e-30,
itermax = 100,
verbose = FALSE,
alpha = 0.3,
beta = 0.8,
backeps = 0
)
Arguments
fn |
A function that returns a list: f, f', f”.
If the function takes vector arguments, the dimensions of the list components must
be 1, |
par |
Numeric vector: starting point. |
thresh |
A small scalar: stop when Newton decrement squared falls belowe |
itermax |
Maximum iterations. Consider optimisation failed if the maximum is reached. |
verbose |
Logical: if true, prints the tracing infornation (iteration log). This is a translation of Algorithm 9.5 from (Boyd and Vandenberghe 2004) into C++. |
alpha |
Back-tracking parameter strictly between 0 and 0.5: acceptance of a decrease in function value by alpha*f of the prediction. |
beta |
Back-tracking parameter strictly between 0 and 1: reduction of the step size until the stopping criterion is met. 0.1 corresponds to a very crude search, 0.8 corresponds to a less crude search. |
backeps |
Back-tracking threshold: the search can miss by this much. Consider setting it to 1e-10 if backtracking seems to be failing due to round-off. |
Value
A list:
References
Boyd S, Vandenberghe L (2004). Convex Optimization. Cambridge University Press.
Examples
f1 <- function(x)
list(fn = x - log(x), gradient = 1 - 1/x, Hessian = matrix(1/x^2, 1, 1))
optim(2, function(x) f1(x)[["fn"]], gr = function(x) f1(x)[["gradient"]], method = "BFGS")
dampedNewton(f1, 2, verbose = TRUE)
# The minimum of f3 should be roughly at -0.57
f3 <- function(x)
list(fn = sum(exp(x) + 0.5 * x^2), gradient = exp(x) + x, Hessian = diag(exp(x) + 1))
dampedNewton(f3, seq(0.1, 5, length.out = 11), verbose = TRUE)
Construct memory-efficient weights for estimation
Description
This function constructs SEL weights with appropriate trimming for numerical stability and optional renormalisation so that the sum of the weights be unity
Usage
getSELWeights(x, bw = NULL, ..., trim = NULL, renormalise = TRUE)
Arguments
x |
A numeric vector (with many close-to-zero elements). |
bw |
A numeric scalar or a vector passed to 'kernelWeights'. |
... |
Other arguments pased to |
trim |
A trimming function that returns a threshold value below which the weights are ignored. In common applications, this function should tend to 0 as the length of |
renormalise |
Logical; passed to 'sparseVectorToList'. |
Value
A list with indices of large enough elements.
Examples
getSELWeights(1:5, bw = 2, kernel = "triangular")
Monotone interpolation between a function and a reference parabola
Description
Create *piece-wise monotone* splines that smoothly join an arbitrary function 'f' to the
quadratic reference curve (x-\mathrm{mean})^{2}/\mathrm{var}
at
a user–chosen abscissa at
. The join occurs over a finite interval
of length gap
, guaranteeing a C1-continuous transition (function and first
derivative are continuous) without violating monotonicity.
Usage
interpTwo(x, f, mean, var, at, gap)
Arguments
x |
A numeric vector of evaluation points. |
f |
Function: the original curve to be spliced into the parabola. It must be vectorised (i.e.\ accept a numeric vector and return a numeric vector of the same length). |
mean |
Numeric scalar defining the shift of the reference parabola. |
var |
Numeric scalar defining the vertical scaling of the reference parabola. |
at |
Numeric scalar: beginning of the transition zone, i.e.\ the boundary where 'f' stops being evaluated and merging into the parabola begins. |
gap |
Positive numeric scalar. Width of the transition window; the spline is constructed on '[at, at+gap]' (or '[at-gap, at]' when 'at < mean') when the reference parabola is higher. If the reference parabola is lower, it is the distance from the point 'z' at which 'f(z) = parabola(z)' to allow some growth and ensure monotonicity. |
Details
This function calls 'interpToHigher()' when the reference parabola is *above* 'f(at)'; the spline climbs from 'f' up to the parabola, and 'interpToLower()' when the parabola is *below* 'f(at)', and the transition interval has to be extended to ensure that the spline does not descend.
Internally, the helpers build a **monotone Hermite cubic spline** via Fritsch–Carlson tangents. Anchor points on each side of the transition window are chosen so that the spline’s one edge matches 'f' while the other edge matches the reference parabola, ensuring strict monotonicity between the two curves.
Value
A numeric vector of length length(x)
containing the smoothly
interpolated values.
See Also
[splinefun()]
Examples
xx <- -4:5 # Global data for EL evaluation
w <- 10:1
w <- w / sum(w)
f <- Vectorize(function(m) -2*weightedEL0(xx, mu = m, ct = w, chull.fail = "none")$logelr)
museq <- seq(-6, 6, 0.1)
LRseq <- f(museq)
plot(museq, LRseq, bty = "n")
rug(xx, lwd = 4)
wm <- weighted.mean(xx, w)
wv <- weighted.mean((xx-wm)^2, w) / sum(w)
lines(museq, (museq - wm)^2 / wv, col = 2, lty = 2)
xr <- seq(4, 6, 0.1)
xl <- seq(-6, -3, 0.1)
lines(xl, interpTwo(xl, f, mean = wm, var = wv, at = -3.5, gap = 0.5), lwd = 2, col = 4)
lines(xr, interpTwo(xr, f, mean = wm, var = wv, at = 4.5, gap = 0.5), lwd = 2, col = 3)
abline(v = c(-3.5, -4, 4.5, 5), lty = 3)
Kernel density estimation
Description
Kernel density estimation
Usage
kernelDensity(
x,
xout = NULL,
weights = NULL,
bw = NULL,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
order = 2,
convolution = FALSE,
chunks = 0,
PIT = FALSE,
deduplicate.x = TRUE,
deduplicate.xout = TRUE,
no.dedup = FALSE,
return.grid = FALSE
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
xout |
A vector or a matrix of data points with |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
bw |
Bandwidth for the kernel: a scalar or a vector of the same length as |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
convolution |
Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation. |
chunks |
Integer: the number of chunks to split the task into (limits
RAM usage but increases overhead). |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
deduplicate.x |
Logical: if TRUE, full duplicates in the input |
deduplicate.xout |
Logical: if TRUE, full duplicates in the input |
no.dedup |
Logical: if TRUE, sets |
return.grid |
Logical: if The number of chunks for kernel density and regression estimation is chosen
in such a manner that the number of elements in the internal weight matrix
should not exceed |
Value
A vector of density estimates evaluated at the grid points or, if return.grid
, a matrix with the density in the last column.
Examples
set.seed(1)
x <- sort(rt(10000, df = 5)) # Observed values
g <- seq(-6, 6, 0.05) # Grid for evaluation
d2 <- kernelDensity(x, g, bw = 0.3, kernel = "epanechnikov", no.dedup = TRUE)
d4 <- kernelDensity(x, g, bw = 0.4, kernel = "quartic", order = 4, no.dedup = TRUE)
plot(g, d2, ylim = range(0, d2, d4), type = "l"); lines(g, d4, col = 2)
# De-duplication facilities for faster operations
set.seed(1) # Creating a data set with many duplicates
n.uniq <- 1000
n <- 4000
inds <- ceiling(runif(n, 0, n.uniq))
x.uniq <- matrix(rnorm(n.uniq*10), ncol = 10)
x <- x.uniq[inds, ]
xout <- x.uniq[ceiling(runif(n.uniq*3, 0, n.uniq)), ]
w <- runif(n)
data.table::setDTthreads(1) # For measuring the pure gains and overhead
RcppParallel::setThreadOptions(numThreads = 1)
kd1 <- kernelDensity(x, xout, w, bw = 0.5)
kd2 <- kernelDensity(x, xout, w, bw = 0.5, no.dedup = TRUE)
stat1 <- attr(kd1, "duplicate.stats")
stat2 <- attr(kd2, "duplicate.stats")
print(stat1[3:5]) # De-duplication time -- worth it
print(stat2[3:5]) # Without de-duplication, slower
unname(prod((1 - stat1[1:2])) / (stat1[5] / stat2[5])) # > 1 = better time
# savings than expected, < 1 = worse time savings than expected
all.equal(as.numeric(kd1), as.numeric(kd2))
max(abs(kd1 - kd2)) # Should be around machine epsilon or less
Density and/or kernel regression estimator with conditioning on discrete variables
Description
Density and/or kernel regression estimator with conditioning on discrete variables
Usage
kernelDiscreteDensitySmooth(x, y = NULL, compact = FALSE, fun = mean)
Arguments
x |
A vector or a matrix/data frame of discrete explanatory variables (exogenous). Non-integer values are fine because the data are split into bins defined by interactions of these variables. |
y |
Optional: a vector of dependent variable values. |
compact |
Logical: return unique values instead of full data with repeated observations? |
fun |
A function that computes a statistic of |
Value
A list with x
, density estimator (fhat
) and, if y
was provided, regression estimate.
Examples
set.seed(1)
x <- sort(rnorm(1000))
p <- 0.5*pnorm(x) + 0.25 # Propensity score
d <- as.numeric(runif(1000) < p)
# g = discrete version of x for binning
g <- as.numeric(as.character(cut(x, -4:4, labels = -4:3+0.5)))
dhat.x <- kernelSmooth(x = x, y = d, bw = 0.4, no.dedup = TRUE)
dhat.g <- kernelDiscreteDensitySmooth(x = g, y = d)
dhat.comp <- kernelDiscreteDensitySmooth(g, d, compact = TRUE)
plot(x, p, ylim = c(0, 1), bty = "n", type = "l", lty = 2)
points(x, dhat.x, col = "#00000044")
points(dhat.comp, col = 2, pch = 16, cex = 2)
lines(dhat.comp$x, dhat.comp$fhat, col = 4, pch = 16, lty = 3)
Basic univatiate kernel functions
Description
Computes 5 most popular kernel functions of orders 2 and 4 with the potential of returning an analytical convolution kernel for density cross-validation. These kernels appear in (Silverman 1986).
Usage
kernelFun(
x,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
order = c(2, 4),
convolution = FALSE
)
Arguments
x |
A numeric vector of values at which to compute the kernel function. |
kernel |
Kernel type: uniform, Epanechnikov, triangular, quartic, or Gaussian. |
order |
Kernel order. 2nd-order kernels are always non-negative.
*k*-th-order kernels have all moments from 1 to (k-1) equal to zero, which is
achieved by having some negative values.
|
convolution |
Logical: return the convolution kernel? (Useful for density cross-validation.) |
Details
The kernel functions take non-zero values on [-1, 1]
, except for
the Gaussian one, which is supposed to have full support, but due to the rapid
decay, is indistinguishable from machine epsilon outside
[-8.2924, 8.2924]
.
Value
A numeric vector of the same length as input.
References
Silverman BW (1986). Density estimation for statistics and data analysis. New York: Chapman and Hall.
Examples
ks <- c("uniform", "triangular", "epanechnikov", "quartic", "gaussian"); names(ks) <- ks
os <- c(2, 4); names(os) <- paste0("o", os)
cols <- c("#000000CC", "#0000CCCC", "#CC0000CC", "#00AA00CC", "#BB8800CC")
put.legend <- function() legend("topright", legend = ks, lty = 1, col = cols, bty = "n")
xout <- seq(-4, 4, length.out = 301)
plot(NULL, NULL, xlim = range(xout), ylim = c(0, 1.1),
xlab = "", ylab = "", main = "Unscaled kernels", bty = "n"); put.legend()
for (i in 1:5) lines(xout, kernelFun(xout, kernel = ks[i]), col = cols[i])
oldpar <- par(mfrow = c(1, 2))
plot(NULL, NULL, xlim = range(xout), ylim = c(-0.1, 0.8), xlab = "", ylab = "",
main = "4th-order kernels", bty = "n"); put.legend()
for (i in 1:5) lines(xout, kernelFun(xout, kernel = ks[i], order = 4), col = cols[i])
par(mfrow = c(1, 1))
plot(NULL, NULL, xlim = range(xout), ylim = c(-0.25, 1.4), xlab = "", ylab = "",
main = "Convolution kernels", bty = "n"); put.legend()
for (i in 1:5) {
for (j in 1:2) lines(xout, kernelFun(xout, kernel = ks[i], order = os[j],
convolution = TRUE), col = cols[i], lty = j)
}; legend("topleft", c("2nd order", "4th order"), lty = 1:2, bty = "n")
par(oldpar)
# All kernels integrate to correct values; we compute the moments
mom <- Vectorize(function(k, o, m, c) integrate(function(x) x^m * kernelFun(x, k, o,
convolution = c), lower = -Inf, upper = Inf)$value)
for (m in 0:4) {
cat("\nComputing integrals of x^", m, " * f(x). \nSimple unscaled kernel:\n", sep = "")
print(round(outer(os, ks, function(o, k) mom(k, o, m = m, c = FALSE)), 4))
cat("Convolution kernel:\n")
print(round(outer(os, ks, function(o, k) mom(k, o, m = m, c = TRUE)), 4))
}
Density with conditioning on discrete and continuous variables
Description
Density with conditioning on discrete and continuous variables
Usage
kernelMixedDensity(
x,
by,
xout = NULL,
byout = NULL,
weights = NULL,
parallel = FALSE,
cores = 1,
preschedule = TRUE,
...
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
by |
A variable containing unique identifiers of discrete categories. |
xout |
A vector or a matrix of data points with |
byout |
A variable containing unique identifiers of discrete categories
for the output grid (same points as |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
parallel |
Logical: if |
cores |
Integer: the number of CPU cores to use. High core count = high RAM usage.
If the number of unique values of 'by' is less than the number of cores requested,
then, only |
preschedule |
Logical: passed as |
... |
Passed to |
Value
A numeric vector of the density estimate of the same length as nrow(xout)
.
Examples
# Estimating 3 densities on something like a panel
set.seed(1)
n <- 200
x <- c(rnorm(n), rchisq(n, 4)/4, rexp(n, 1))
by <- rep(1:3, each = n)
xgrid <- seq(-3, 6, 0.1)
out <- expand.grid(x = xgrid, by = 1:3)
fhat <- kernelMixedDensity(x = x, xout = out$x, by = by, byout = out$by)
plot(xgrid, dnorm(xgrid)/3, type = "l", bty = "n", lty = 2, ylim = c(0, 0.35),
xlab = "", ylab = "Density")
lines(xgrid, dchisq(xgrid*4, 4)*4/3, lty = 2, col = 2)
lines(xgrid, dexp(xgrid, 1)/3, lty = 2, col = 3)
for (i in 1:3) {
lines(xgrid, fhat[out$by == i], col = i, lwd = 2)
rug(x[by == i], col = i)
}
legend("top", c("00", "10", "01", "11"), col = 2:5, lwd = 2)
Smoothing with conditioning on discrete and continuous variables
Description
Smoothing with conditioning on discrete and continuous variables
Usage
kernelMixedSmooth(
x,
y,
by,
xout = NULL,
byout = NULL,
weights = NULL,
parallel = FALSE,
cores = 1,
preschedule = TRUE,
...
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
y |
A numeric vector of dependent variable values. |
by |
A variable containing unique identifiers of discrete categories. |
xout |
A vector or a matrix of data points with |
byout |
A variable containing unique identifiers of discrete categories
for the output grid (same points as |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
parallel |
Logical: if |
cores |
Integer: the number of CPU cores to use. High core count = high RAM usage.
If the number of unique values of 'by' is less than the number of cores requested,
then, only |
preschedule |
Logical: passed as |
... |
Passed to |
Value
A numeric vector of the kernel estimate of the same length as nrow(xout)
.
Examples
set.seed(1)
n <- 1000
z1 <- rbinom(n, 1, 0.5)
z2 <- rbinom(n, 1, 0.5)
x <- rnorm(n)
u <- rnorm(n)
y <- 1 + x^2 + z1 + 2*z2 + z1*z2 + u
by <- as.integer(interaction(list(z1, z2)))
out <- expand.grid(x = seq(-4, 4, 0.25), by = 1:4)
yhat <- kernelMixedSmooth(x = x, y = y, by = by, bw = 1, degree = 1,
xout = out$x, byout = out$by)
plot(x, y)
for (i in 1:4) lines(out$x[out$by == i], yhat[out$by == i], col = i+1, lwd = 2)
legend("top", c("00", "10", "01", "11"), col = 2:5, lwd = 2)
# The function works faster if there are duplicated values of the
# conditioning variables in the prediction grid and there are many
# observations; this is illustrated by the following example
# without a custom grid
# In this example, ignore the fact that the conditioning variable is rounded
# and therefore contains measurement error (ruining consistency)
x <- rnorm(10000)
xout <- rnorm(5000)
xr <- round(x)
xrout <- round(xout)
w <- runif(10000, 1, 3)
y <- 1 + x^2 + rnorm(10000)
by <- rep(1:4, each = 2500)
byout <- rep(1:4, each = 1250)
system.time(kernelMixedSmooth(x = x, y = y, by = by, weights = w,
xout = xout, byout = byout, bw = 1))
# user system elapsed
# 0.144 0.000 0.144
system.time(km1 <- kernelMixedSmooth(x = xr, y = y, by = by, weights = w,
xout = xrout, byout = byout, bw = 1))
# user system elapsed
# 0.021 0.000 0.022
system.time(km2 <- kernelMixedSmooth(x = xr, y = y, by = by, weights = w,
xout = xrout, byout = byout, bw = 1, no.dedup = TRUE))
# user system elapsed
# 0.138 0.001 0.137
all.equal(km1, km2)
# Parallel capabilities shine in large data sets
if (.Platform$OS.type != "windows") {
# A function to carry out the same estimation in multiple cores
pFun <- function(n) kernelMixedSmooth(x = rep(x, 2), y = rep(y, 2),
weights = rep(w, 2), by = rep(by, 2),
bw = 1, degree = 0, parallel = TRUE, cores = n)
system.time(pFun(1)) # 0.6--0.7 s
system.time(pFun(2)) # 0.4--0.5 s
}
Local kernel smoother
Description
Local kernel smoother
Usage
kernelSmooth(
x,
y,
xout = NULL,
weights = NULL,
bw = NULL,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
order = 2,
convolution = FALSE,
chunks = 0,
PIT = FALSE,
LOO = FALSE,
degree = 0,
trim = function(x) 0.01/length(x),
robust.iterations = 0,
robust = c("bisquare", "huber"),
deduplicate.x = TRUE,
deduplicate.xout = TRUE,
no.dedup = FALSE,
return.grid = FALSE
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
y |
A numeric vector of dependent variable values. |
xout |
A vector or a matrix of data points with |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
bw |
Bandwidth for the kernel: a scalar or a vector of the same length as |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
convolution |
Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation. |
chunks |
Integer: the number of chunks to split the task into (limits
RAM usage but increases overhead). |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
LOO |
Logical: If |
degree |
Integer: 0 for locally constant estimator (Nadaraya–Watson), 1 for locally linear (Cleveland's LOESS), 2 for locally quadratic (use with care, less stable, requires larger bandwidths) |
trim |
Trimming function for small weights to speed up locally weighted regression (if |
robust.iterations |
The number of robustifying iterations (due to Cleveland, 1979). If greater than 0, |
robust |
Character: "huber" for Huber's local regression weights, "bisquare" for more robust bi-square ones |
deduplicate.x |
Logical: if TRUE, full duplicates in the input |
deduplicate.xout |
Logical: if TRUE, full duplicates in the input |
no.dedup |
Logical: if TRUE, sets |
return.grid |
If Standardisation is recommended for the purposes of numerical stability (sometimes
The robust iterations are carried out, if requested, according to @cleveland1979robust. Huber weights are never zero; bisquare weights create a more robust re-descending estimator. Note: if Note: if |
Value
A vector of predicted values or, if return.grid
is TRUE
,
a matrix with the predicted values in the last column.
Examples
set.seed(1)
n <- 300
x <- sort(rt(n, df = 6)) # Observed values
g <- seq(-4, 5, 0.1) # Grid for evaluation
f <- function(x) 1 + x + sin(x) # True E(Y | X) = f(X)
y <- f(x) + rt(n, df = 4)
# 3 estimators: locally constant + 2nd-order kernel,
# locally constant + 4th-order kernel, locally linear robust
b2lc <- suppressWarnings(bw.CV(x, y = y, kernel = "quartic", deduplicate.x = FALSE)
+ 0.8)
b4lc <- suppressWarnings(bw.CV(x, y = y, kernel = "quartic", order = 4,
try.grid = FALSE, start.bw = 3, deduplicate.x = FALSE) + 1)
b2ll <- bw.CV(x, y = y, kernel = "quartic", degree = 1, robust.iterations = 1,
try.grid = FALSE, start.bw = 3, verbose = TRUE, deduplicate.x = FALSE)
m2lc <- kernelSmooth(x, y, g, bw = b2lc, kernel = "quartic", no.dedup = TRUE)
m4lc <- kernelSmooth(x, y, g, bw = b4lc, kernel = "quartic", order = 4, no.dedup = TRUE)
m2ll <- kernelSmooth(x, y, g, bw = b2ll, kernel = "quartic",
degree = 1, robust.iterations = 1, no.dedup = TRUE)
plot(x, y, xlim = c(-6, 7), col = "#00000088", bty = "n")
lines(g, f(g), col = "white", lwd = 5); lines(g, f(g))
lines(g, m2lc, col = 2); lines(g, m4lc, col = 3); lines(g, m2ll, col = 4)
# De-duplication facilities for faster operations
set.seed(1) # Creating a data set with many duplicates
n.uniq <- 1000
n <- 4000
inds <- sort(ceiling(runif(n, 0, n.uniq)))
x.uniq <- sort(rnorm(n.uniq))
y.uniq <- 1 + x.uniq + sin(x.uniq*2) + rnorm(n.uniq)
x <- x.uniq[inds]
y <- y.uniq[inds]
xout <- x.uniq[sort(ceiling(runif(n.uniq*3, 0, n.uniq)))]
w <- runif(n)
data.table::setDTthreads(1) # For measuring the pure gains and overhead
RcppParallel::setThreadOptions(numThreads = 1)
kr1 <- kernelSmooth(x, y, xout, w, bw = 0.2)
kr2 <- kernelSmooth(x, y, xout, w, bw = 0.5, no.dedup = TRUE)
stat1 <- attr(kr1, "duplicate.stats")
stat2 <- attr(kr2, "duplicate.stats")
print(stat1[3:5]) # De-duplication time -- worth it
print(stat2[3:5]) # Without de-duplication, slower
unname(prod((1 - stat1[1:2])) / (stat1[5] / stat2[5])) # > 1 = better time
# savings than expected, < 1 = worse time savings than expected
all.equal(as.numeric(kr1), as.numeric(kr2))
max(abs(kr1 - kr2)) # Should be around machine epsilon or less
# Example in 2 dimensions
# TODO
Kernel-based weights
Description
Kernel-based weights
Usage
kernelWeights(
x,
xout = NULL,
bw = NULL,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
order = 2,
convolution = FALSE,
sparse = FALSE,
PIT = FALSE,
deduplicate.x = FALSE,
deduplicate.xout = FALSE,
no.dedup = FALSE
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
xout |
A vector or a matrix of data points with |
bw |
Bandwidth for the kernel: a scalar or a vector of the same length as |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
convolution |
Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation. |
sparse |
Logical: TODO (should be ignored?) Note that if Technical remark: if the kernel is Gaussian, then, the ratio of the tail density to the maximum value (at 0) is less than mach.eps/2 when abs(x) > 2*sqrt(106*log(2)) ~ 8.572. This has implications the relative error of the calculation: even the kernel with full support (theoretically) may fail to produce numerically distinct values if the argument values are more than ~8.5 standard deviations away from the mean. |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
deduplicate.x |
Logical: if TRUE, full duplicates in the input |
deduplicate.xout |
Logical: if TRUE, full duplicates in the input |
no.dedup |
Logical: if TRUE, sets |
Value
A matrix of weights of dimensions nrow(xout) x nrow(x).
Examples
set.seed(1)
x <- sort(rnorm(1000)) # Observed values
g <- seq(-10, 10, 0.1) # Grid for evaluation
w <- kernelWeights(x, g, bw = 2, kernel = "triangular")
wsp <- kernelWeights(x, g, bw = 2, kernel = "triangular", sparse = TRUE)
print(c(object.size(w), object.size(wsp)) / 1024) # Kilobytes used
image(g, x, w)
all.equal(w[, 1], # Internal calculation for one column
kernelFun((g - x[1])/2, "triangular", 2, FALSE))
# Bare-bones interface to the C++ functions
# Example: 4th-order convolution kernels
x <- seq(-3, 5, length.out = 301)
ks <- c("uniform", "triangular", "epanechnikov", "quartic", "gaussian")
kmat <- sapply(ks, function(k) kernelFun(x, k, 4, TRUE))
matplot(x, kmat, type = "l", lty = 1, bty = "n", lwd = 2)
legend("topright", ks, col = 1:5, lwd = 2)
Modified logarithm with derivatives
Description
Modified logarithm with derivatives
Usage
logTaylor(x, lower = NULL, upper = NULL, der = 0, order = 4)
Arguments
x |
Numeric vector for which approximated logarithm is to be computed. |
lower |
Lower threshold below which approximation starts; can be a scalar of a vector of the same length as |
upper |
Upper threshold above which approximation starts; can be a scalar of a vector of the same length as |
der |
Non-negative integer: 0 yields the function, 1 and higher yields derivatives |
order |
Positive integer: Taylor approximation order. If |
Details
Provides a family of alternatives to -log() and derivative thereof in order to attain self-concordance and
computes the modified negative logarithm and its first derivatives.
For lower <= x <= upper, returns just the logarithm. For x < lower and x > upper, returns the Taylor approximation of the given order
.
4th order is the lowest that gives self concordance.
Value
A numeric matrix with (order+1)
columns containing the values of the modified log and its derivatives.
Examples
x <- seq(0.01^0.25, 2^0.25, length.out = 51)^4 - 0.11 # Denser where |f'| is higher
plot(x, log(x)); abline(v = 0, lty = 2) # Observe the warning
lines(x, logTaylor(x, lower = 0.2), col = 2)
lines(x, logTaylor(x, lower = 0.5), col = 3)
lines(x, logTaylor(x, lower = 1, upper = 1.2, order = 6), col = 4)
# Substitute log with its Taylor approx. around 1
x <- seq(0.1, 2, 0.05)
ae <- abs(sapply(2:6, function(o) log(x) - logTaylor(x, lower=1, upper=1, order=o)))
matplot(x[x!=1], ae[x!=1,], type = "l", log = "y", lwd = 2,
main = "Abs. trunc. err. of Taylor expansion at 1", ylab = "")
Probability integral transform
Description
Probability integral transform
Usage
pit(x, xout = NULL)
Arguments
x |
A numeric vector of data points. |
xout |
A numeric vector. If supplied, then the transformed function at the grid points different from |
Value
A numeric vector of values strictly between 0 and 1 of the same length as xout
(or x
, if xout
is NULL
).
Examples
set.seed(2)
x1 <- c(4, 3, 7, 10, 2, 2, 7, 2, 5, 6)
x2 <- sample(c(0, 0.5, 1, 2, 2.5, 3, 3.5, 10, 100), 25, replace = TRUE)
l <- length(x1)
pit(x1)
plot(pit(x1), ecdf(x1)(x1), xlim = c(0, 1), ylim = c(0, 1), asp = 1)
abline(v = seq(0.5 / l, 1 - 0.5 / l, length.out = l), col = "#00000044", lty = 2)
abline(v = c(0, 1))
points(pit(x1, x2), ecdf(x1)(x2), pch = 16, col = "#CC000088", cex = 0.9)
abline(v = pit(x1, x2), col = "#CC000044", lty = 2)
x1 <- c(1, 1, 3, 4, 6)
x2 <- c(0, 2, 2, 5.9, 7, 8)
pit(x1)
pit(x1, x2)
set.seed(1)
l <- 10
x1 <- rlnorm(l)
x2 <- sample(c(x1, rlnorm(10)))
plot(pit(x1), ecdf(x1)(x1), xlim = c(0, 1), ylim = c(0, 1), asp = 1)
abline(v = seq(0.5 / l, 1 - 0.5 / l, length.out = l), col = "#00000044", lty = 2)
abline(v = c(0, 1))
points(pit(x1, x2), ecdf(x1)(x2), pch = 16, col = "#CC000088", cex = 0.9)
Check the data for kernel estimation
Description
Checks if the order is 2, 4, or 6, transforms the objects into matrices, checks the dimensions, provides the bandwidth, creates default arguments to pass to the C++ functions, carries out de-duplication for speed-up etc.
Usage
prepareKernel(
x,
y = NULL,
xout = NULL,
weights = NULL,
bw = NULL,
kernel = c("gaussian", "uniform", "triangular", "epanechnikov", "quartic"),
order = 2,
convolution = FALSE,
sparse = FALSE,
deduplicate.x = TRUE,
deduplicate.xout = TRUE,
no.dedup = FALSE,
PIT = FALSE
)
Arguments
x |
A numeric vector, matrix, or data frame containing observations. For density, the points used to compute the density. For kernel regression, the points corresponding to explanatory variables. |
y |
Optional: a vector of dependent variable values. |
xout |
A vector or a matrix of data points with |
weights |
A numeric vector of observation weights (typically counts) to
perform weighted operations. If null, |
bw |
Bandwidth for the kernel: a scalar or a vector of the same length as |
kernel |
Character describing the desired kernel type. NB: due to limited machine precision, even Gaussian has finite support. |
order |
An integer: 2, 4, or 6. Order-2 kernels are the standard kernels that are positive everywhere. Orders 4 and 6 produce some negative values, which reduces bias but may hamper density estimation. |
convolution |
Logical: if FALSE, returns the usual kernel. If TRUE, returns the convolution kernel that is used in density cross-validation. |
sparse |
Logical: TODO (ignored) |
deduplicate.x |
Logical: if TRUE, full duplicates in the input |
deduplicate.xout |
Logical: if TRUE, full duplicates in the input |
no.dedup |
Logical: if TRUE, sets |
PIT |
If TRUE, the Probability Integral Transform (PIT) is applied to all columns
of |
Value
A list of arguments that are taken by [kernelDensity()] and [kernelSmooth()].
Examples
# De-duplication facilities
set.seed(1) # Creating a data set with many duplicates
n.uniq <- 10000
n <- 60000
inds <- ceiling(runif(n, 0, n.uniq))
x.uniq <- matrix(rnorm(n.uniq*10), ncol = 10)
x <- x.uniq[inds, ]
y <- runif(n.uniq)[inds]
xout <- x.uniq[ceiling(runif(n.uniq*3, 0, n.uniq)), ]
w <- runif(n)
print(system.time(a1 <- prepareKernel(x, y, xout, w, bw = 0.5)))
print(system.time(a2 <- prepareKernel(x, y, xout, w, bw = 0.5,
deduplicate.x = FALSE, deduplicate.xout = FALSE)))
print(c(object.size(a1), object.size(a2)) / 1024) # Kilobytes used
# Speed-memory trade-off: 4 times smaller, takes 0.2 s, but reduces the
# number of matrix operations by a factor of
1 - prod(1 - a1$duplicate.stats[1:2]) # 95% fewer operations
sum(a1$weights) - sum(a2$weights) # Should be 0 or near machine epsilon
Smoothed Empirical Likelihood function value
Description
Evaluates SEL function for a given moment function at a certain parameter value.
Usage
smoothEmplik(
rho,
theta,
data,
sel.weights = NULL,
type = c("EL", "EuL", "EL0"),
kernel.args = list(bw = NULL, kernel = "epanechnikov", order = 2, PIT = TRUE, sparse =
TRUE),
EL.args = list(chull.fail = "taylor", weight.tolerance = NULL),
minus = FALSE,
parallel = FALSE,
cores = 1,
chunks = NULL,
sparse = FALSE,
verbose = FALSE,
bad.value = -Inf,
attach.attributes = c("none", "all", "ELRs", "residuals", "lam", "nabla", "converged",
"exitcode", "probabilities"),
...
)
Arguments
rho |
The moment function depending on parameters and data (and potentially other parameters). Must return a numeric vector. |
theta |
A parameter at which the moment function is evaluated. |
data |
A data object on which the moment function is computed. |
sel.weights |
Either a matrix with valid kernel smoothing weights with rows adding up to 1,
or a function that computes the kernel weights based on the |
type |
Character: |
kernel.args |
A list of arguments passed to |
EL.args |
A list of arguments passed to |
minus |
If TRUE, returns SEL times -1 (for optimisation via minimisation). |
parallel |
If TRUE, uses |
cores |
The number of cores used by |
chunks |
The number of chunks into which the weight matrix is split for memory saving.
One chunk is good for sample sizes 2000 and below. If equal to the number of observations, then,
the smoothed likelihoods are computed in series, which saves memory but computes kernel weights at
every step of a loop, increasing CPU time.
If |
sparse |
Logical: convert the weight matrix to a sparse one? |
verbose |
If |
bad.value |
Replace non-finite individual SEL values with this value. May be useful if the optimiser does not allow specific non-finite values (like L-BFGS-B). |
attach.attributes |
If |
... |
Passed to |
Value
A scalar with the SEL value and, if requested, attributes containing the diagnostic information attached to it.
Examples
set.seed(1)
x <- sort(rlnorm(50))
# Heteroskedastic DGP
y <- abs(1 + 1*x + rnorm(50) * (1 + x + sin(x)))
mod.OLS <- lm(y ~ x)
rho <- function(theta, ...) y - theta[1] - theta[2]*x # Moment fn
w <- kernelWeights(x, PIT = TRUE, bw = 0.25, kernel = "epanechnikov")
w <- w / rowSums(w)
image(x, x, w, log = "xy")
theta.vals <- list(c(1, 1), coef(mod.OLS))
SEL <- function(b, ...) smoothEmplik(rho = rho, theta = b, sel.weights = w, ...)
sapply(theta.vals, SEL) # Smoothed empirical likelihood
# SEL maximisation
ctl <- list(fnscale = -1, reltol = 1e-6, ndeps = rep(1e-5, 2),
trace = 1, REPORT = 5)
b.init <- coef(mod.OLS)
b.init <- c(1.790207, 1.007491) # Only to speed up estimation
b.SEL <- optim(b.init, SEL, method = "BFGS", control = ctl)
print(b.SEL$par) # Closer to the true value (1, 1) than OLS
plot(x, y)
abline(1, 1, lty = 2)
abline(mod.OLS, col = 2)
abline(b.SEL$par, col = 4)
# Euclidean likelihood
SEuL <- function(b, ...) smoothEmplik(rho = rho, theta = b,
type = "EuL", sel.weights = w, ...)
b.SEuL <- optim(coef(mod.OLS), SEuL, method = "BFGS", control = ctl)
abline(b.SEuL$par, col = 3)
cbind(SEL = b.SEL$par, SEuL = b.SEuL$par)
# Now we start from (0, 0), for which the Taylor expansion is necessary
# because all residuals at this starting value are positive and the
# unmodified EL ratio for the test of equality to 0 is -Inf
smoothEmplik(rho=rho, theta=c(0, 0), sel.weights = w, EL.args = list(chull.fail = "none"))
smoothEmplik(rho=rho, theta=c(0, 0), sel.weights = w)
# The next example is very slow; approx. 1 minute
# Experiment: a small bandwidth so that the spanning condition should fail often
# It yields an appalling estimator
w <- kernelWeights(x, PIT = TRUE, bw = 0.15, kernel = "epanechnikov")
w <- w / rowSums(w)
# The first option is faster but it may sometimes fails
b.SELt <- optim(c(0, 0), SEL, EL.args = list(chull.fail = "taylor"),
method = "BFGS", control = ctl)
b.SELw <- optim(c(0, 0), SEL, EL.args = list(chull.fail = "wald"),
method = "BFGS", control = ctl)
# In this sense, Euclidean likelihood is robust to convex hull violations
b.SELu <- optim(c(0, 0), SEuL, method = "BFGS", control = ctl)
b0grid <- seq(-1.5, 7, length.out = 51)
b1grid <- seq(-1.5, 4.5, length.out = 51)
bgrid <- as.matrix(expand.grid(b0grid, b1grid))
fi <- function(i) smoothEmplik(rho, bgrid[i, ], sel.weights = w, type = "EL0",
EL.args = list(chull.fail = "taylor"))
ncores <- max(floor(parallel::detectCores()/2 - 1), 1)
chk <- Sys.getenv("_R_CHECK_LIMIT_CORES_", "") # Limit to 2 cores for CRAN checks
if (nzchar(chk) && chk == "TRUE") ncores <- min(ncores, 2L)
selgrid <- unlist(parallel::mclapply(1:nrow(bgrid), fi, mc.cores = ncores))
selgrid <- matrix(selgrid, nrow = length(b0grid))
probs <- c(0.25, 0.5, 0.75, 0.8, 0.9, 0.95, 0.99, 1-10^seq(-4, -16, -2))
levs <- qchisq(probs, df = 2)
# levs <- c(1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000)
labs <- round(levs, 1)
cols <- rainbow(length(levs), end = 0.7, v = 0.7)
oldpar <- par(mar = c(4, 4, 2, 0) + .1)
selgrid2 <- -2*(selgrid - max(selgrid, na.rm = TRUE))
contour(b0grid, b1grid, selgrid2, levels = levs,
labels = labs, col = cols, lwd = 1.5, bty = "n",
main = "'Safe' likelihood contours", asp = 1)
image(b0grid, b1grid, log1p(selgrid2))
# The narrow lines are caused by the fact that if two observations are close together
# at the edge, the curvature at that point is extreme
# The same with Euclidean likelihood
seulgrid <- unlist(parallel::mclapply(1:nrow(bgrid), function(i)
smoothEmplik(rho, bgrid[i, ], sel.weights = w, type = "EuL"),
mc.cores = ncores))
seulgrid <- matrix(seulgrid, nrow = length(b0grid))
seulgrid2 <- -50*(seulgrid - max(seulgrid, na.rm = TRUE))
par(mar = c(4, 4, 2, 0) + .1)
contour(b0grid, b1grid, seulgrid2, levels = levs,
labels = labs, col = cols, lwd = 1.5, bty = "n",
main = "'Safe' likelihood contours", asp = 1)
image(b0grid, b1grid, log1p(seulgrid2))
par(oldpar)
Convert a weight vector to list
Description
This function saves memory (which is crucial in large samples) and allows one to speed up the code by minimising the number of
time-consuming subsetting operations and memory-consuming matrix multiplications. We do not want to rely on extra packages for
sparse matrix manipulation since the EL smoothing weights are usually fixed at the beginning, and need not be recomputed dynamically,
so we recommend applying this function to the rows of a matrix. In order to avoid numerical instability, the weights are trimmed
at 0.01 / length(x)
. Using too much trimming may cause the spanning condition to fail (the moment function values can have the same sign in some neighbourhoods).
Usage
sparseVectorToList(x, trim = NULL, renormalise = FALSE)
sparseMatrixToList(x, trim = NULL, renormalise = FALSE)
Arguments
x |
A numeric vector or matrix (with many close-to-zero elements). |
trim |
A trimming function that returns a threshold value below which the weights are ignored. In common applications, this function should tend to 0 as the length of |
renormalise |
Logical: renormalise the sum of weights to one after trimming? |
Value
A list with indices and values of non-zero elements.
Examples
set.seed(1)
m <- round(matrix(rnorm(100), 10, 10), 2)
m[as.logical(rbinom(100, 1, 0.7))] <- 0
sparseVectorToList(m[, 3])
sparseMatrixToList(m)
Least-squares regression via SVD
Description
Least-squares regression via SVD
Usage
svdlm(x, y, rel.tol = 1e-09, abs.tol = 1e-100)
Arguments
x |
Model matrix. |
y |
Response vector. |
rel.tol |
Relative zero tolerance for generalised inverse via SVD. |
abs.tol |
Absolute zero tolerance for generalised inverse via SVD. Newton steps for many empirical likelihoods are of least-squares type.
Denote The tolerances are used to check the closeness of singular values to zero. The values of the
singular-value vector |
Value
A vector of coefficients.
Examples
b.svd <- svdlm(x = cbind(1, as.matrix(mtcars[, -1])), y = mtcars[, 1])
b.lm <- coef(lm(mpg ~ ., data = mtcars))
b.lm - b.svd # Negligible differences
d-th derivative of the k-th-order Taylor expansion of log(x)
Description
d-th derivative of the k-th-order Taylor expansion of log(x)
Usage
tlog(x, a = as.numeric(c(1)), k = 4L, d = 0L)
Arguments
x |
Numeric: a vector of points for which the logarithm is to be evaluated |
a |
Scalar: the point at which the polynomial approximation is computed |
k |
Non-negative integer: maximum polynomial order in the Taylor expansion
of the original function. |
d |
Non-negative integer: derivative order Note that this function returns the d-th derivative of the k-th-order Taylor expansion, not the
k-th-order approximation of the d-th derivative. Therefore, the degree of the resulting polynomial
is |
Value
The approximating Taylor polynomial around a
of the order d-k
evaluated at x
.
Examples
cl <- rainbow(9, end = 0.8, v = 0.8, alpha = 0.8)
a <- 1.5
x <- seq(a*2, a/2, length.out = 101)
f <- function(x, d = 0) if (d == 0) log(x) else ((d%%2 == 1)*2-1) * 1/x^d * gamma(d)
oldpar <- par(mfrow = c(2, 3), mar = c(2, 2, 2.5, 0.2))
for (d in 0:5) {
y <- f(x, d = d)
plot(x, y, type = "l", lwd = 7, bty = "n", ylim = range(0, y),
main = paste0("d^", d, "/dx^", d, " Taylor(Log(x))"))
for (k in 0:8) lines(x, tlog(x, a = a, k = k, d = d), col = cl[k+1], lwd = 1.5)
points(a, f(a, d = d), pch = 16, cex = 1.5, col = "white")
}
legend("topright", as.character(0:8), title = "Order", col = cl, lwd = 1)
par(oldpar)
Weighted trimmed mean
Description
Compute a weighted trimmed mean, i.e. a mean that assigns non-negative weights to the observations and (2) discards an equal share of total weight from each tail of the distribution before averaging.
Usage
trimmed.weighted.mean(x, trim = 0, w = NULL, na.rm = FALSE, ...)
Arguments
x |
Numeric vector of data values. |
trim |
Single number in |
w |
Numeric vector of non-negative weights of the same length as 'x'. If 'NULL' (default), equal weights are used. |
na.rm |
Logical: should 'NA' values in 'x' or 'w' be removed? |
... |
Further arguments passed to ['weighted.mean()'] (for compatibility). |
Details
For example, 'trim = 0.10' removes 10 from the right (20 Setting 'trim = 0.5' returns the weighted median.
The algorithm follows these steps:
Sort the data by 'x' and accumulate the corresponding weights.
Identify the lower and upper cut-points that mark the central share of the total weight.
Drop observations whose cumulative weight lies entirely outside the cut-points and proportionally down-weight the two (at most) remaining outermost observations.
Return the weighted mean of the retained mass. If 'trim == 0.5', only the 50
Value
A single numeric value: the trimmed weighted mean of 'x'. Returns 'NA_real_' if no non-'NA' observations remain after optional 'na.rm' handling.
See Also
['mean()'] for the unweighted trimmed mean, ['weighted.mean()'] for the untrimmed weighted mean.
Examples
set.seed(1)
z <- rt(100, df = 3)
w <- pmin(1, 1 / abs(z)^2) # Far-away observations tails get lower weight
mean(z, trim = 0.20) # Ordinary trimmed mean
trimmed.weighted.mean(z, trim = 0.20) # Same
weighted.mean(z, w) # Ordinary weighted mean (no trimming)
trimmed.weighted.mean(z, w = w) # Same
trimmed.weighted.mean(z, trim = 0.20, w = w) # Weighted trimmed mean
trimmed.weighted.mean(z, trim = 0.5, w = w) # Weighted median
Self-concordant multi-variate empirical likelihood with counts
Description
Implements the empirical-likelihood-ratio test for the mean of the coordinates of z
(with the hypothesised value mu
). The counts need not be integer;
in the context of local likelihoods, they can be kernel observation weights.
Usage
weightedEL(
z,
mu = NULL,
ct = NULL,
lambda.init = NULL,
SEL = FALSE,
return.weights = FALSE,
lower = NULL,
upper = NULL,
order = 4L,
weight.tolerance = NULL,
thresh = 1e-16,
itermax = 100L,
verbose = FALSE,
alpha = 0.3,
beta = 0.8,
backeps = 0
)
Arguments
z |
A numeric vector or a matrix with one data vector per column. |
mu |
Hypothesised mean, default (0 ... 0) in R^ncol(z) |
ct |
A numeric vector of non-negative counts. |
lambda.init |
Starting lambda, default (0 ... 0) |
SEL |
If |
return.weights |
Logical: if |
lower |
Lower cut-off for [logTaylor()], default |
upper |
Upper cutoff for [logTaylor()], default |
order |
Positive integer such that the Taylor approximation of this order to log(x) is self-concordant; usually 4 or higher. Passed to [logTaylor()]. |
weight.tolerance |
Weight tolerance for counts to improve numerical stability |
thresh |
Convergence threshold for log-likelihood (the default is aggressive) |
itermax |
Upper bound on number of Newton steps (seems ample) |
verbose |
Logical: print output diagnostics? |
alpha |
Backtracking line search parameter: acceptance of a decrease in function value by ALPHA*f of the prediction based on the linear extrapolation. |
beta |
Backtracking line search reduction factor. 0.1 corresponds to a very crude search, 0.8 corresponds to a less crude search. |
backeps |
Backtrack threshold: the search can miss by this much. Consider setting it to 1e-10 if backtracking seems to be failing due to round-off. |
Details
Negative weights are not allowed. They could be useful in some applications, but they can destroy convexity or even boundedness. They also make the Newton step fail to be of least squares type.
This function relies on the improved computational strategy for the empirical likelihood. The search of the lambda multipliers is carried out via a dampened Newton method with guaranteed convergence owing to the fact that the log-likelihood is replaced by its Taylor approximation of any desired order (default: 4, the minimum value that ensures self-concordance).
Tweak alpha
and beta
with extreme caution. See (Boyd and Vandenberghe 2004),
pp. 464–466 for details. It is necessary that 0 < alpha < 1/2
and 0 < beta < 1
.
alpha = 0.3
seems better than 0.01 on some 2-dimensional test data (sometimes fewer iterations).
The argument names, except for lambda.init
, are matching the original names in Art B. Owen's implementation.
The highly optimised one-dimensional counterpart, weightedEL0
, is designed to return a faster
and a more accurate solution in the one-dimensional case.
Value
A list with the following values:
- logelr
Log of empirical likelihood ratio (equal to 0 if the hypothesised mean is equal to the sample mean)
- lam
Vector of Lagrange multipliers
- wts
Observation weights/probabilities (vector of length n)
- converged
TRUE
if algorithm converged.FALSE
usually means that mu is not in the convex hull of the data. Then, a very small likelihood is returned (instead of zero).- iter
Number of iterations taken.
- ndec
Newton decrement (see Boyd & Vandenberghe).
- gradnorm
Norm of the gradient of log empirical likelihood.
Source
This original code was written for (Owen 2013)
and [published online](https://artowen.su.domains/empirical/) by Art B. Owen
(March 2015, February 2017). The present version was rewritten in Rcpp
and
slightly reworked to contain fewer inner functions and loops.
References
Boyd S, Vandenberghe L (2004).
Convex Optimization.
Cambridge University Press.
Owen AB (2013).
“Self-concordance for empirical likelihood.”
Canadian Journal of Statistics, 41(3), 387–397.
See Also
[logTaylor()], [weightedEL0()]
Examples
earth <- c(
5.5, 5.61, 4.88, 5.07, 5.26, 5.55, 5.36, 5.29, 5.58, 5.65, 5.57, 5.53, 5.62, 5.29,
5.44, 5.34, 5.79, 5.1, 5.27, 5.39, 5.42, 5.47, 5.63, 5.34, 5.46, 5.3, 5.75, 5.68, 5.85
)
weightedEL(earth, mu = 5.517, verbose = TRUE) # 5.517 is the modern accepted value
# Linear regression through empirical likelihood
coef.lm <- coef(lm(mpg ~ hp + am, data = mtcars))
xmat <- cbind(1, as.matrix(mtcars[, c("hp", "am")]))
yvec <- mtcars$mpg
foc.lm <- function(par, x, y) { # The sample average of this
resid <- y - drop(x %*% par) # must be 0
resid * x
}
minusEL <- function(par) -weightedEL(foc.lm(par, xmat, yvec), itermax = 10)$logelr
coef.el <- optim(c(mean(yvec), 0, 0), minusEL)$par
abs(coef.el - coef.lm) / coef.lm # Relative difference
# Likelihood ratio testing without any variance estimation
# Define the profile empirical likelihood for the coefficient on am
minusPEL <- function(par.free, par.am)
-weightedEL(foc.lm(c(par.free, par.am), xmat, yvec), itermax = 20)$logelr
# Constrained maximisation assuming that the coef on par.am is 3.14
coef.el.constr <- optim(coef.el[1:2], minusPEL, par.am = 3.14)$par
print(-2 * weightedEL(foc.lm(c(coef.el.constr, 3.14), xmat, yvec))$logelr)
# Exceeds the critical value qchisq(0.95, df = 1)
Uni-variate empirical likelihood via direct lambda search
Description
Empirical likelihood with counts to solve one-dimensional problems efficiently with Brent's root search algorithm.
Conducts an empirical likelihood ratio test of the hypothesis that the mean of z
is mu
.
The names of the elements in the returned list are consistent with the original R code in (Owen 2017).
Usage
weightedEL0(
z,
mu = NULL,
ct = NULL,
shift = NULL,
return.weights = FALSE,
SEL = FALSE,
weight.tolerance = NULL,
boundary.tolerance = 1e-09,
trunc.to = 0,
chull.fail = c("taylor", "wald", "adjusted", "balanced", "none"),
uniroot.control = list(),
verbose = FALSE
)
Arguments
z |
Numeric data vector. |
mu |
Hypothesized mean of |
ct |
Numeric count variable with non-negative values that indicates the multiplicity of observations.
Can be fractional. Very small counts below the threshold |
shift |
The value to add in the denominator (useful in case there are extra Lagrange multipliers): 1 + lambda'Z + shift. |
return.weights |
Logical: if TRUE, individual EL weights are computed and returned. Setting this to FALSE gives huge memory savings in large data sets, especially when smoothing is used. |
SEL |
If |
weight.tolerance |
Weight tolerance for counts to improve numerical stability (similar to the ones in Art B. Owen's 2017 code, but adapting to the sample size). |
boundary.tolerance |
Relative tolerance for determining when the lambda is not an interior solution because it is too close to the boundary. Corresponds to a fraction of the interval range length. |
trunc.to |
Counts under |
chull.fail |
A character: what to do if the convex hull of |
uniroot.control |
A list passed to the |
verbose |
Logical: if |
Details
This function provides the core functionality for univariate empirical likelihood. The technical details is given in (Cosma et al. 2019), although the algorithm used in that paper is slower than the one provided by this function.
Since we know that the EL probabilities belong to (0, 1), the interval (bracket) for \lambda
search
can be determined in the spirit of formula (2.9) from (Owen 2001). Let
z^*_i := z_i - \mu
be the recentred observations.
p_i = c_i/N \cdot (1 + \lambda z^*_i + s)^{-1}
The probabilities are bounded from above: p_i<1
for all i, therefore,
c_i/N \cdot (1 + \lambda z^*_i + s)^{-1} < 1
c_i/N - 1 - s < \lambda z^*_i
Two cases: either z^*_i<0
, or z^*_i>0
(cases with z^*_i=0
are trivially excluded because they do not affect the EL). Then,
(c_i/N - 1 - s)/z^*_i > \lambda,\ \forall i: z^*_i<0
(c_i/N - 1 - s)/z^*_i < \lambda,\ \forall i: z^*_i>0
which defines the search bracket:
\lambda_{\min} := \max_{i: z^*_i>0} (c_i/N - 1 - s)/z^*_i
\lambda_{\max} := \min_{i: z^*_i<0} (c_i/N - 1 - s)/z^*_i
\lambda_{\min} < \lambda < \lambda_{\max}
(This derivation contains s, which is the extra shift that extends the
function to allow mixed conditional and unconditional estimation;
Owen's textbook formula corresponds to s = 0
.)
The actual tolerance of the lambda search in brentZero
is
2 |\lambda_{\max}| \epsilon_m + \mathrm{tol}/2
,
where tol
can be set in uniroot.control
and
\epsilon_m
is .Machine$double.eps
.
The sum of log-weights is maximised without Taylor expansion, forcing mu
to be inside
the convex hull of z
. If a violation is happening, consider using the chull.fail
argument
or switching to Euclidean likelihood via [weightedEuL()].
Value
A list with the following elements:
- logelr
Logarithm of the empirical likelihood ratio.
- lam
The Lagrange multiplier.
- wts
Observation weights/probabilities (of the same length as
z
).- converged
TRUE
if the algorithm converged,FALSE
otherwise (usually means thatmu
is not within the range ofz
, i.e. the one-dimensional convex hull ofz
).- iter
The number of iterations used (from
brentZero
).- bracket
The admissible interval for lambda (that is, yielding weights between 0 and 1).
- estim.prec
The approximate estimated precision of lambda (from
brentZero
).- f.root
The value of the derivative of the objective function w.r.t. lambda at the root (from
brentZero
). Values> sqrt(.Machine$double.eps)
indicate convergence problems.- exitcode
An integer indicating the reason of termination.
- message
Character string describing the optimisation termination status.
References
Chen J, Variyath AM, Abraham B (2008).
“Adjusted empirical likelihood and its properties.”
Journal of Computational and Graphical Statistics, 17(2), 426–443.
doi:10.1198/106186008x321068.
Cosma A, Kostyrka AV, Tripathi G (2019).
“Inference in conditional moment restriction models when there is selection due to stratification.”
In Huynh KP, Jacho-Chavez DT, Tripathi G (eds.), The Econometrics of Complex Survey Data: Theory and Applications, 137–171.
Emerald Publishing Limited.
ISBN 978-1-78756-726-9.
Emerson SC, Owen AB (2009).
“Calibration of the empirical likelihood method for a vector mean.”
Electronic Journal of Statistics, 3, 1161–1192.
ISSN 1935-7524, doi:10.1214/09-ejs518.
Owen AB (2001).
Empirical Likelihood.
Chapman and Hall/CRC, New York, USA.
Owen AB (2017).
A weighted self-concordant optimization for empirical likelihood.
https://artowen.su.domains/empirical/countnotes.pdf.
See Also
[weightedEL()]
Examples
# Figure 2.4 from Owen (2001) -- with a slightly different data point
earth <- c(
5.5, 5.61, 4.88, 5.07, 5.26, 5.55, 5.36, 5.29, 5.58, 5.65, 5.57, 5.53, 5.62, 5.29,
5.44, 5.34, 5.79, 5.1, 5.27, 5.39, 5.42, 5.47, 5.63, 5.34, 5.46, 5.3, 5.75, 5.68, 5.85
)
set.seed(1)
system.time(r1 <- replicate(40, weightedEL(sample(earth, replace = TRUE), mu = 5.517)))
set.seed(1)
system.time(r2 <- replicate(40, weightedEL0(sample(earth, replace = TRUE), mu = 5.517)))
plot(apply(r1, 2, "[[", "logelr"), apply(r1, 2, "[[", "logelr") - apply(r2, 2, "[[", "logelr"),
bty = "n", xlab = "log(ELR) computed via dampened Newthon method",
main = "Discrepancy between weightedEL and weightedEL0", ylab = "")
abline(h = 0, lty = 2)
# Handling the convex hull violation differently
weightedEL0(1:9, chull.fail = "none")
weightedEL0(1:9, chull.fail = "taylor")
weightedEL0(1:9, chull.fail = "wald")
# Interpolation to well-defined branches outside the convex hull
mu.seq <- seq(-1, 7, 0.1)
wEL1 <- -2*sapply(mu.seq, function(m) weightedEL0(1:9, mu = m, chull.fail = "none")$logelr)
wEL2 <- -2*sapply(mu.seq, function(m) weightedEL0(1:9, mu = m, chull.fail = "taylor")$logelr)
wEL3 <- -2*sapply(mu.seq, function(m) weightedEL0(1:9, mu = m, chull.fail = "wald")$logelr)
plot(mu.seq, wEL1)
lines(mu.seq, wEL2, col = 2)
lines(mu.seq, wEL3, col = 4)
# Warning: depending on the compiler, the discrepancy between weightedEL and weightedEL0
# can be one million (1) times larger than the machine epsilon despite both of them
# being written in pure R
# The results from Apple clang-1400.0.29.202 and Fortran GCC 12.2.0 are different from
# those obtained under Ubuntu 22.04.4 + GCC 11.4.0-1ubuntu1~22.04,
# Arch Linux 6.6.21 + GCC 14.1.1, and Windows Server 2022 + GCC 13.2.0
out1 <- weightedEL(earth, mu = 5.517)[1:4]
out2 <- weightedEL0(earth, mu = 5.517, return.weights = TRUE)[1:4]
print(c(out1$lam, out2$lam), 16)
# Value of lambda weightedEL weightedEL0
# aarch64-apple-darwin20 -1.5631313955?????? -1.5631313957?????
# Windows, Ubuntu, Arch -1.563131395492627 -1.563131395492627
Multi-variate Euclidean likelihood with analytical solution
Description
Multi-variate Euclidean likelihood with analytical solution
Usage
weightedEuL(
z,
mu = NULL,
ct = NULL,
vt = NULL,
shift = NULL,
SEL = TRUE,
weight.tolerance = NULL,
trunc.to = 0,
return.weights = FALSE,
verbose = FALSE,
chull.diag = FALSE
)
Arguments
z |
Numeric data vector. |
mu |
Hypothesized mean of |
ct |
Numeric count variable with non-negative values that indicates the multiplicity of observations.
Can be fractional. Very small counts below the threshold |
vt |
Numeric vector: non-negative variance weights for estimating the conditional
variance of |
shift |
The value to add in the denominator (useful in case there are extra Lagrange multipliers): 1 + lambda'Z + shift. |
SEL |
If |
weight.tolerance |
Weight tolerance for counts to improve numerical stability (similar to the ones in Art B. Owen's 2017 code, but adapting to the sample size). |
trunc.to |
Counts under |
return.weights |
Logical: if TRUE, individual EL weights are computed and returned. Setting this to FALSE gives huge memory savings in large data sets, especially when smoothing is used. |
verbose |
Logical: if |
chull.diag |
Logical: if The arguments
,
where This estimator is numerically equivalent to the Sieve Minimum Distance estimator
of (Ai and Chen 2003) with kernel sieves, but this interface
provides more flexibility through the two sets of weights. If |
Value
A list with the same structure as that in [weightedEL()].
References
Ai C, Chen X (2003).
“Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions.”
Econometrica, 71(6), 1795–1843.
ISSN 1468-0262, doi:10.1111/1468-0262.00470.
Owen AB (2001).
Empirical Likelihood.
Chapman and Hall/CRC, New York, USA.
See Also
[weightedEL()]
Examples
set.seed(1)
z <- cbind(rnorm(10), runif(10))
colMeans(z)
a <- weightedEuL(z, return.weights = TRUE)
a$wts
sum(a$wts) # Unity
colSums(a$wts * z) # Zero