| Type: | Package |
| Title: | Targeted Inference |
| Version: | 0.6 |
| Date: | 2025-10-29 |
| Author: | Klaus K. Holst [aut, cre], Benedikt Sommer [aut], Andreas Nordland [aut] |
| Maintainer: | Klaus K. Holst <klaus@holst.it> |
| Description: | Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) <doi:10.1111/j.1541-0420.2005.00377.x>), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) <doi:10.2202/1557-4679.1008>), estimators for risk differences and relative risks (Richardson et al. (2017) <doi:10.1080/01621459.2016.1192546>), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) <doi:10.1111/rssb.12504>). |
| Depends: | R (≥ 4.1) |
| Imports: | R6, Rcpp (≥ 1.0.0), abind, cli, data.table, future.apply, lava (≥ 1.8.0), methods, mets, optimx, quadprog, progressr, rlang, survival |
| Suggests: | SuperLearner (≥ 2.0-28), cmprsk, MASS, e1071, earth, glmnet, grf, hal9001, knitr, mgcv, nnls, polle (≥ 1.5), pracma, randomForestSRC, ranger, riskRegression, rmarkdown, scatterplot3d, tinytest, viridisLite, xgboost |
| BugReports: | https://github.com/kkholst/targeted/issues |
| URL: | https://kkholst.github.io/targeted/ |
| License: | Apache License (== 2.0) |
| LinkingTo: | Rcpp, RcppArmadillo |
| LazyLoad: | yes |
| NeedsCompilation: | yes |
| ByteCompile: | yes |
| RcppModules: | riskregmodel |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| VignetteBuilder: | knitr |
| Packaged: | 2025-10-29 22:30:56 UTC; kkzh |
| Repository: | CRAN |
| Date/Publication: | 2025-10-30 07:50:09 UTC |
targeted: Targeted Inference
Description
Various methods for targeted and semiparametric inference including augmented inverse probability weighted (AIPW) estimators for missing data and causal inference (Bang and Robins (2005) doi:10.1111/j.1541-0420.2005.00377.x), variable importance and conditional average treatment effects (CATE) (van der Laan (2006) doi:10.2202/1557-4679.1008), estimators for risk differences and relative risks (Richardson et al. (2017) doi:10.1080/01621459.2016.1192546), assumption lean inference for generalized linear model parameters (Vansteelandt et al. (2022) doi:10.1111/rssb.12504).
Author(s)
Maintainer: Klaus K. Holst klaus@holst.it
Authors:
Benedikt Sommer benediktsommer92@gmail.com
Andreas Nordland andreasnordland@gmail.com
Klaus K. Holst (Maintainer) klaus@holst.it
References
Bang & Robins (2005) Doubly Robust Estimation in Missing Data and Causal Inference Models, Biometrics.
Vansteelandt & Dukes (2022) Assumption-lean inference for generalised linear model parameters, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
Thomas S. Richardson, James M. Robins & Linbo Wang (2017) On Modeling and Estimation for the Relative Risk and Risk Difference, Journal of the American Statistical Association.
Mark J. van der Laan (2006) Statistical Inference for Variable Importance, The International Journal of Biostatistics.
See Also
Useful links:
Examples
## Not run:
example(riskreg)
example(cate)
example(ate)
example(calibration)
## End(Not run)
ML model
Description
Wrapper for ml_model
Usage
ML(formula, model = "glm", ...)
Arguments
formula |
formula |
model |
model (sl, rf, pf, glm, ...) |
... |
additional arguments to model object |
Responder Average Treatment Effect
Description
Estimation of the Average Treatment Effect among Responders
Usage
RATE(
response,
post.treatment,
treatment,
data,
family = gaussian(),
M = 5,
pr.treatment,
treatment.level,
SL.args.response = list(family = gaussian(), SL.library = c("SL.mean", "SL.glm")),
SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")),
preprocess = NULL,
efficient = TRUE,
...
)
Arguments
response |
Response formula (e.g, Y ~ D*A) |
post.treatment |
Post treatment marker formula (e.g., D ~ W) |
treatment |
Treatment formula (e.g, A ~ 1) |
data |
data.frame |
family |
Exponential family for response (default gaussian) |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting) |
pr.treatment |
(optional) Randomization probability of treatment. |
treatment.level |
Treatment level in binary treatment (default 1) |
SL.args.response |
Arguments to SuperLearner for the response model |
SL.args.post.treatment |
Arguments to SuperLearner for the post treatment indicator |
preprocess |
(optional) Data preprocessing function |
efficient |
If TRUE, the estimate will be efficient. If FALSE, the estimate will be a simple plug-in estimate. |
... |
Additional arguments to lower level functions |
Value
estimate object
Author(s)
Andreas Nordland, Klaus K. Holst
Responder Average Treatment Effect
Description
Estimation of the Average Treatment Effect among Responders for Survival Outcomes
Usage
RATE.surv(
response,
post.treatment,
treatment,
censoring,
tau,
data,
M = 5,
pr.treatment,
call.response,
args.response = list(),
SL.args.post.treatment = list(family = binomial(), SL.library = c("SL.mean", "SL.glm")),
call.censoring,
args.censoring = list(),
preprocess = NULL,
...
)
Arguments
response |
Response formula (e.g., Surv(time, event) ~ D + W). |
post.treatment |
Post treatment marker formula (e.g., D ~ W) |
treatment |
Treatment formula (e.g, A ~ 1) |
censoring |
Censoring formula (e.g., Surv(time, event == 0) ~ D + A + W)). |
tau |
Time-point of interest, see Details. |
data |
data.frame |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting) |
pr.treatment |
(optional) Randomization probability of treatment. |
call.response |
Model call for the response model (e.g. "mets::phreg"). |
args.response |
Additional arguments to the response model. |
SL.args.post.treatment |
Arguments to SuperLearner for the post treatment indicator |
call.censoring |
Similar to call.response. |
args.censoring |
Similar to args.response. |
preprocess |
(optional) Data preprocessing function |
... |
Additional arguments to lower level functions |
Details
Estimation of
\frac{P(T \leq \tau|A=1) - P(T \leq \tau|A=1)}{E[D|A=1]}
under right censoring based on plug-in estimates of P(T \leq \tau|A=a)
and E[D|A=1].
An efficient one-step estimator of P(T \leq \tau|A=a) is constructed
using the efficient influence function
\frac{I\{A=a\}}{P(A = a)} \Big(\frac{\Delta}{S^c_{0}(\tilde T|X)}
I\{\tilde T \leq \tau\} + \int_0^\tau
\frac{S_0(u|X)-S_0(\tau|X)}{S_0(u|X)S^c_0(u|X)} d M^c_0(u|X)\Big)
+ \Big(1 - \frac{I\{A=a\}}{P(A = a)}\Big)F_0(\tau|A=a, W)
- P(T \leq \tau|A=a).
An efficient one-step estimator of E[D|A=1] is constructed using the
efficient influence function
\frac{A}{P(A = 1)}\left(D-E[D|A=1, W]\right) + E[D|A=1, W] -E[D|A=1].
Value
estimate object
Author(s)
Andreas Nordland, Klaus K. Holst
SuperLearner wrapper for learner
Description
SuperLearner wrapper for learner
Usage
SL(
formula = ~.,
...,
SL.library = c("SL.mean", "SL.glm"),
binomial = FALSE,
data = NULL,
info = "SuperLearner"
)
Arguments
formula |
Model design |
... |
Additional arguments for SuperLearner::SuperLearner |
SL.library |
character vector of prediction algorithms |
binomial |
boolean specifying binomial or gaussian family (default FALSE) |
data |
Optional data.frame |
info |
model information (optional) |
Value
learner object
Author(s)
Klaus Kähler Holst
AIPW estimator
Description
AIPW for the mean (and linear projections of the EIF) with missing observations
Usage
aipw(response_model, propensity_model, formula = ~1, data, ...)
Arguments
response_model |
Model for the response given covariates (learner or formula) |
propensity_model |
Optional missing data mechanism model (propensity model) (learner or formula) |
formula |
design specifying the OLS estimator with outcome given by the EIF |
data |
data.frame |
... |
additional arguments (see |
Examples
m <- lava::lvm(y ~ x+z, r ~ x) |>
lava::distribution(~ r, value = lava::binomial.lvm()) |>
transform(y0~r+y, value = \(x) { x[x[,1]==0,2] <- NA; x[,2] })
d <- lava::sim(m,1e3,seed=1)
aipw(y0 ~ x, data=d)
Assumption Lean inference for generalized linear model parameters
Description
Assumption lean inference via cross-fitting (Double ML). See <doi:10.1111/rssb.12504
Usage
alean(
response_model,
exposure_model,
data,
link = "identity",
g_model,
nfolds = 1,
silent = FALSE,
mc.cores,
...
)
Arguments
response_model |
formula or learner object (formula => glm) |
exposure_model |
model for the exposure |
data |
data.frame |
link |
Link function (g) |
g_model |
Model for |
nfolds |
Number of folds |
silent |
supress all messages and progressbars |
mc.cores |
mc.cores Optional number of cores. parallel::mcmapply used instead of future |
... |
additional arguments to future.apply::future_mapply |
Details
Let Y be the response variable, A the exposure and W
covariates. The target parameter is:
\Psi(P) = \frac{E(Cov[A,
g\{E(Y|A,W)\}\mid W])} {E\{Var(A\mid W)\}}
The response_model is the model for E(Y|A,W), and
exposure_model is the model for E(A|W).
link specifies g.
Value
alean.targeted object
Author(s)
Klaus Kähler Holst
Examples
sim1 <- function(n, family=gaussian(), ...) {
m <- lava::lvm() |>
lava::distribution(~y, value=lava::binomial.lvm()) |>
lava::regression('a', value=function(l) l) |>
lava::regression('y', value=function(a,l) a + l)
if (family$family=="binomial")
lava::distribution(m, ~a) <- lava::binomial.lvm()
lava::sim(m, n)
}
library(splines)
f <- binomial()
d <- sim1(1e4, family=f)
e <- alean(
response_model=learner_glm(y ~ a + bs(l, df=3), family=binomial),
exposure_model=learner_glm(a ~ bs(l, df=3), family=f),
data=d,
link = "logit", mc.cores=1, nfolds=1
)
e
e <- alean(response_model=learner_glm(y ~ a + l, family=binomial),
exposure_model=learner_glm(a ~ l),
data=d,
link = "logit", mc.cores=1, nfolds=1)
e
AIPW (doubly-robust) estimator for Average Treatment Effect
Description
Augmented Inverse Probability Weighting estimator for the Average (Causal)
Treatment Effect. All nuisance models are here parametric (glm). For a more
general approach see the cate implementation. In this implementation
the standard errors are correct even when the nuisance models are
mis-specified (the influence curve is calculated including the term coming
from the parametric nuisance models). The estimate is consistent if either
the propensity model or the outcome model / Q-model is correctly specified.
Usage
ate(
formula,
data = parent.frame(),
weights,
offset,
family = stats::gaussian(identity),
nuisance = NULL,
propensity = nuisance,
all,
labels = NULL,
adjust.nuisance = TRUE,
adjust.propensity = TRUE,
...
)
Arguments
formula |
formula (see details below) |
data |
data.frame |
weights |
optional frequency weights |
offset |
optional offset (character or vector). can also be specified in the formula. |
family |
Exponential family argument for outcome model |
nuisance |
outcome regression formula (Q-model) |
propensity |
propensity model formula |
all |
when TRUE all standard errors are calculated (default TRUE when exposure only has two levels) |
labels |
optional treatment labels |
adjust.nuisance |
adjust for uncertainty due to estimation of outcome regression model parameters |
adjust.propensity |
adjust for uncertainty due to estimation of propensity regression model parameters |
... |
additional arguments to lower level functions |
Details
The formula may either be specified as: response ~ treatment | nuisance-formula | propensity-formula
For example: ate(y~a | x+z+a | x*z, data=...)
Alternatively, as a list: ate(list(y~a, ~x+z, ~x*z), data=...)
Or using the nuisance (and propensity argument):
ate(y~a, nuisance=~x+z, ...)
Value
An object of class 'ate.targeted' is returned. See
targeted-class for more details about this class and its
generic functions.
Author(s)
Klaus K. Holst
See Also
cate
Examples
m <- lava::lvm(y ~ a+x, a~x) |>
lava::distribution(~y, value = lava::binomial.lvm()) |>
lava::ordinal(K=4, ~a) |>
transform(~a, value = factor)
d <- lava::sim(m, 1e3, seed=1)
# (a <- ate(y~a|a*x|x, data=d))
(a <- ate(y~a, nuisance=~a*x, propensity=~x, data = d))
# Comparison with randomized experiment
m0 <- lava::cancel(m, a~x)
lm(y~a-1, lava::sim(m0,2e4))
# Choosing a different contrast for the association measures
summary(a, contrast=c(2,4))
Calibration (training)
Description
Calibration for multiclassication methods
Usage
calibration(
pr,
cl,
weights = NULL,
threshold = 10,
method = "bin",
breaks = nclass.Sturges,
df = 3,
...
)
Arguments
pr |
matrix with probabilities for each class |
cl |
class variable |
weights |
counts |
threshold |
do not calibrate if less then 'threshold' events |
method |
either 'isotonic' (pava), 'logistic', 'mspline' (monotone spline), 'bin' (local constant) |
breaks |
optional number of bins (only for method 'bin') |
df |
degrees of freedom (only for spline methods) |
... |
additional arguments to lower level functions |
Details
...
Value
An object of class 'calibration' is returned.
See calibration-class
for more details about this class and its generic functions.
Author(s)
Klaus K. Holst
Examples
sim1 <- function(n, beta=c(-3, rep(.5,10)), rho=.5) {
p <- length(beta)-1
xx <- lava::rmvn0(n,sigma=diag(nrow=p)*(1-rho)+rho)
y <- rbinom(n, 1, lava::expit(cbind(1,xx)%*%beta))
d <- data.frame(y=y, xx)
names(d) <- c("y",paste0("x",1:p))
return(d)
}
set.seed(1)
beta <- c(-2,rep(1,10))
d <- sim1(1e4, beta=beta)
a1 <- naivebayes(y ~ ., data=d)
a2 <- glm(y ~ ., data=d, family=binomial)
## a3 <- randomForest(factor(y) ~ ., data=d, family=binomial)
d0 <- sim1(1e4, beta=beta)
p1 <- predict(a1, newdata=d0)
p2 <- predict(a2, newdata=d0, type="response")
## p3 <- predict(a3, newdata=d0, type="prob")
c2 <- calibration(p2, d0$y, method="isotonic")
c1 <- calibration(p1, d0$y, breaks=100)
if (interactive()) {
plot(c1)
plot(c2,col="red",add=TRUE)
abline(a=0,b=1)#'
with(c1$xy[[1]], points(pred,freq,type="b", col="red"))
}
set.seed(1)
beta <- c(-2,rep(1,10))
dd <- lava::csplit(sim1(1e4, beta=beta), k=3)
mod <- naivebayes(y ~ ., data=dd[[1]])
p1 <- predict(mod, newdata=dd[[2]])
cal <- calibration(p1, dd[[2]]$y)
p2 <- predict(mod, newdata=dd[[3]])
pp <- predict(c1, p2)
cc <- calibration(pp, dd[[3]]$y)
if (interactive()) {#'
plot(cal)
plot(cc, add=TRUE, col="blue")
}
calibration class object
Description
The functions calibration returns an object
of the class calibration.
An object of class 'calibration' is a list with at least the
following components:
- stepfun
estimated step-functions (see
stepfun) for each class- classes
the unique classes
- model
model/method type (string)
- xy
list of data.frame's with predictions (pr) and estimated probabilities of success (only for 'bin' method)
Value
objects of the S3 class 'calibration'
S3 generics
The following S3 generic functions are available for an object
of class targeted:
predictApply calibration to new data.
plotPlot the calibration curves (reliability plot).
printBasic print method.
See Also
Examples
## See example(calibration) for examples
Conditional Average Treatment Effect estimation
Description
Conditional Average Treatment Effect estimation with cross-fitting.
Usage
cate(
response.model,
propensity.model,
cate.model = ~1,
calibration.model = NULL,
data,
contrast,
nfolds = 1,
rep = 1,
silent = FALSE,
stratify = FALSE,
mc.cores = NULL,
rep.type = c("nuisance", "average"),
var.type = "IC",
second.order = TRUE,
response_model = deprecated,
cate_model = deprecated,
propensity_model = deprecated,
treatment = deprecated,
...
)
Arguments
response.model |
formula or learner object (formula => learner_glm) |
propensity.model |
formula or learner object (formula => learner_glm) |
cate.model |
formula specifying regression design for conditional average treatment effects |
calibration.model |
linear calibration model. Specify covariates in addition to predicted potential outcomes to include in the calibration. |
data |
data.frame |
contrast |
treatment contrast (default 1 vs 0) |
nfolds |
number of folds |
rep |
number of replications of cross-fitting procedure |
silent |
suppress all messages and progressbars |
stratify |
if TRUE the response.model will be stratified by treatment |
mc.cores |
(optional) number of cores. parallel::mcmapply used instead of future |
rep.type |
repeated cross-fitting applied by averaging nuisance models
( |
var.type |
when equal to "IC" the asymptotic variance is derived from
the influence function. Otherwise, based on expressions in Bannick et al.
(2025) valid under different covariate-adaptive randomization schemes (only
available for ATE and when |
second.order |
add seconder order term to IF to handle misspecification of outcome models |
response_model |
Deprecated. Use response.model instead. |
cate_model |
Deprecated. Use cate.model instead. |
propensity_model |
Deprecated. Use propensity.model instead. |
treatment |
Deprecated. Use cate.model instead. |
... |
additional arguments to future.apply::future_mapply |
Details
We have observed data (Y,A,W) where Y is the response variable,
A the binary treatment, and W covariates. We further let V
be a subset of the covariates. Define the conditional potential mean outcome
\psi_{a}(P)(V) = E_{P}[E_{P}(Y\mid A=a, W)|V]
and let m(V;
\beta) denote a parametric working model, then the target parameter is the
mean-squared error
\beta(P) = \operatorname{argmin}_{\beta}
E_{P}[\{\Psi_{1}(P)(V)-\Psi_{0}(P)(V)\} - m(V; \beta)]^{2}
Value
cate.targeted object
Author(s)
Klaus Kähler Holst, Andreas Nordland
References
Mark J. van der Laan (2006) Statistical Inference for Variable Importance, The International Journal of Biostatistics.
Examples
sim1 <- function(n=1000, ...) {
w1 <- rnorm(n)
w2 <- rnorm(n)
a <- rbinom(n, 1, plogis(-1 + w1))
y <- cos(w1) + w2*a + 0.2*w2^2 + a + rnorm(n)
data.frame(y, a, w1, w2)
}
d <- sim1(5000)
## ATE
cate(cate.model=~1,
response.model=y~a*(w1+w2),
propensity.model=a~w1+w2,
data=d)
## CATE
cate(cate.model=~1+w2,
response.model=y~a*(w1+w2),
propensity.model=a~w1+w2,
data=d)
## Not run: ## superlearner example
mod1 <- list(
glm = learner_glm(y~w1+w2),
gam = learner_gam(y~s(w1) + s(w2))
)
s1 <- learner_sl(mod1, nfolds=5)
cate(cate.model=~1,
response.model=s1,
propensity.model=learner_glm(a~w1+w2, family=binomial),
data=d,
stratify=TRUE)
## End(Not run)
Conditional Relative Risk estimation
Description
Conditional average treatment effect estimation via Double Machine Learning
Usage
cate_link(
treatment,
link = "identity",
response_model,
propensity_model,
importance_model,
contrast = c(1, 0),
data,
nfolds = 5,
type = "dml1",
...
)
Arguments
treatment |
formula specifying treatment and variables to condition on |
link |
Link function |
response_model |
SL object |
propensity_model |
SL object |
importance_model |
SL object |
contrast |
treatment contrast (default 1 vs 0) |
data |
data.frame |
nfolds |
Number of folds |
type |
'dml1' or 'dml2' |
... |
additional arguments to SuperLearner |
Value
cate.targeted object
Author(s)
Klaus Kähler Holst & Andreas Nordland
Examples
# Example 1:
sim1 <- function(n=1e4,
seed=NULL,
return_model=FALSE, ...){
suppressPackageStartupMessages(require("lava"))
if (!is.null(seed)) set.seed(seed)
m <- lava::lvm()
distribution(m, ~x) <- gaussian.lvm()
distribution(m, ~v) <- gaussian.lvm(mean = 10)
distribution(m, ~a) <- binomial.lvm("logit")
regression(m, "a") <- function(v, x){.1*v + x}
distribution(m, "y") <- gaussian.lvm()
regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v}
if (return_model) return(m)
lava::sim(m, n = n)
}
if (require("SuperLearner",quietly=TRUE)) {
d <- sim1(n = 1e3, seed = 1)
e <- cate_link(data=d,
type = "dml2",
treatment = a ~ v,
response_model = y~ a*(x + v + I(v^2)),
importance_model = SL(D_ ~ v + I(v^2)),
nfolds = 10)
summary(e) # the true parameters are c(1,1)
}
Construct a learner
Description
Construct a learner
Arguments
info |
(character) Optional information to describe the instantiated learner object. |
formula |
(formula) Formula specifying response and design matrix. |
learner.args |
(list) Additional arguments to learner$new(). |
Value
learner object.
cross_validated class object
Description
The functions cv returns an object of the type
cross_validated.
An object of class 'cross_validated' is a list with at least the
following components:
- cv
An array with the model score(s) evaluated for each fold, repetition, and model estimates (see
estimate.default)- names
Names (character vector) of the models
- rep
number of repetitions of the CV
- folds
Number of folds of the CV
Value
objects of the S3 class 'cross_validated'
S3 generics
The following S3 generic functions are available for an object of
class cross_validated:
coefExtract average model scores from the cross-validation procedure.
printBasic print method.
summarySummary of the cross-validation procedure.
'
See Also
Examples
# See example(cv) for examples
Conditional Relative Risk estimation
Description
Conditional Relative Risk estimation via Double Machine Learning
Usage
crr(
treatment,
response_model,
propensity_model,
importance_model,
contrast = c(1, 0),
data,
nfolds = 5,
type = "dml1",
...
)
Arguments
treatment |
formula specifying treatment and variables to condition on |
response_model |
SL object |
propensity_model |
SL object |
importance_model |
SL object |
contrast |
treatment contrast (default 1 vs 0) |
data |
data.frame |
nfolds |
Number of folds |
type |
'dml1' or 'dml2' |
... |
additional arguments to SuperLearner |
Value
cate.targeted object
Author(s)
Klaus Kähler Holst & Andreas Nordland
Examples
sim1 <- function(n=1e4,
seed=NULL,
return_model=FALSE, ...){
suppressPackageStartupMessages(require("lava"))
if (!is.null(seed)) set.seed(seed)
m <- lava::lvm()
distribution(m, ~x) <- gaussian.lvm()
distribution(m, ~v) <- gaussian.lvm(mean = 10)
distribution(m, ~a) <- binomial.lvm("logit")
regression(m, "a") <- function(v, x){.1*v + x}
distribution(m, "y") <- gaussian.lvm()
regression(m, "y") <- function(a, v, x){v+x+a*x+a*v*v}
if (return_model) return(m)
lava::sim(m, n = n)
}
d <- sim1(n = 2e3, seed = 1)
if (require("SuperLearner",quietly=TRUE)) {
e <- crr(data=d,
type = "dml2",
treatment = a ~ v,
response_model = learner_glm(y~ a*(x + v + I(v^2))),
importance_model = learner_glm(D_ ~ v + I(v^2)),
propensity_model = learner_glm(a ~ x + v + I(v^2), family=binomial),
nfolds = 2)
summary(e) # the true parameters are c(1,1)
}
Predict the cumulative hazard/survival function for a survival model
Description
Predict the cumulative hazard/survival function for a survival model
Usage
cumhaz(
object,
newdata,
times = NULL,
individual.time = FALSE,
extend = FALSE,
...
)
Arguments
object |
Survival model object: phreg, coxph, rfsrc, ranger |
newdata |
data.frame |
times |
numeric vector: Time points at which the survival model is evaluated. If NULL, the time points associated with the survival model is used. |
individual.time |
logical: If TRUE the survival object is evaluated at different time points for each row in newdata. The number of rows in newdata and the length of times must be the same. |
extend |
if TRUE, prints information for all specified 'times’, even if there are no subjects left at the end of the specified ‘times’ (see survival::summary.survfit). |
... |
Additional arguments. |
Value
List with elements:
time: numeric vector
chf: cumulative hazard function. If individual.time = FALSE, matrix with dimension (nrow(newdata), length(times)). If individual.time = TRUE, vector of length length(times).
surv: survival function, exp(-chf).
-
dchf: t(diff(rbind(0, t(chf))))
Author(s)
Klaus K. Holst, Andreas Nordland
Cross-validation
Description
Generic cross-validation function
Usage
## Default S3 method:
cv(
object,
data,
response = NULL,
nfolds = 5,
rep = 1,
weights = NULL,
model.score = scoring,
seed = NULL,
shared = NULL,
args.pred = NULL,
args.future = list(),
mc.cores,
silent = FALSE,
...
)
Arguments
object |
List of learner objects |
data |
data.frame or matrix |
response |
Response variable (vector or name of column in |
nfolds |
Number of folds (nfolds=0 simple test/train split into two folds 1:([n]/2), ([n]+1/2):n with last part used for testing) |
rep |
Number of repetitions (default 1) |
weights |
Optional frequency weights |
model.score |
Model scoring metric (default: MSE / Brier score). Must be a function with arguments response and prediction, and may optionally include weights, object and newdata arguments |
seed |
Random seed (argument parsed to future_Apply::future_lapply) |
shared |
Function applied to each fold with results send to each model |
args.pred |
Optional arguments to prediction function (see details below) |
args.future |
Arguments to future.apply::future_mapply |
mc.cores |
Optional number of cores. parallel::mcmapply used instead of future |
silent |
suppress all messages and progressbars |
... |
Additional arguments parsed to elements in |
Details
object should be list of objects of class learner.
Alternatively, each element of models should be a list with a fitting
function and a prediction function.
The response argument can optionally be a named list where the name is
then used as the name of the response argument in models. Similarly, if data
is a named list with a single data.frame/matrix then this name will be used
as the name of the data/design matrix argument in models.
Value
An object of class 'cross_validated' is returned. See
cross_validated-class for more details about this class and
its generic functions.
Author(s)
Klaus K. Holst
See Also
Examples
m <- list(learner_glm(Sepal.Length~1),
learner_glm(Sepal.Length~Species),
learner_glm(Sepal.Length~Species + Petal.Length))
x <- cv(m, rep=10, data=iris)
x
Cross-validation for learner_sl
Description
Cross-validation estimation of the generalization error of the super learner and each of the separate models in the ensemble. Both the chosen model scoring metrics as well as the model weights of the stacked ensemble.
Usage
## S3 method for class 'learner_sl'
cv(object, data, nfolds = 5, rep = 1, model.score = scoring, ...)
Arguments
object |
(learner_sl) Instantiated learner_sl object. |
data |
data.frame or matrix |
nfolds |
Number of folds (nfolds=0 simple test/train split into two folds 1:([n]/2), ([n]+1/2):n with last part used for testing) |
rep |
Number of repetitions (default 1) |
model.score |
Model scoring metric (default: MSE / Brier score). Must be a function with arguments response and prediction, and may optionally include weights, object and newdata arguments |
... |
Additional arguments parsed to elements in |
Examples
sim1 <- function(n = 5e2) {
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
data.frame(y, x1, x2)
}
sl <- learner_sl(list(
"mean" = learner_glm(y ~ 1),
"glm" = learner_glm(y ~ x1),
"glm2" = learner_glm(y ~ x1 + x2)
))
cv(sl, data = sim1(), rep = 2)
Cast warning for deprecated function argument names
Description
Cast warning for deprecated function argument names
Usage
deprecate_arg_warn(old, new, fun, vers)
Arguments
old |
deprecated argument name |
new |
argument that should be used instead |
fun |
function name where arguments are deprecated |
vers |
version when argument is deprecated |
Deprecated argument names
Description
Deprecated argument names
Arguments
response_model |
Deprecated. Use response.model instead. |
propensity_model |
Deprecated. Use propensity.model instead. |
cate_model |
Deprecated. Use cate.model instead. |
treatment |
Deprecated. Use cate.model instead. |
Extract design matrix
Description
Extract design matrix from data.frame and formula
Usage
design(
formula,
data,
...,
intercept = FALSE,
response = FALSE,
rm_envir = FALSE,
specials = NULL,
specials.call = NULL,
xlev = NULL,
design.matrix = TRUE
)
Arguments
formula |
formula |
data |
data.frame |
... |
additional arguments (e.g, specials such weights, offsets, ...) |
intercept |
(logical) If FALSE an intercept is not included in the design matrix |
response |
(logical) if FALSE the response variable is dropped |
rm_envir |
Remove environment |
specials |
character vector specifying functions in the formula that should be marked as special in the terms object |
specials.call |
(call) specials optionally defined as a call-type |
xlev |
a named list of character vectors giving the full set of levels to be assumed for each factor |
design.matrix |
(logical) if FALSE then only response and specials are
returned. Otherwise, the design.matrix |
Value
An object of class 'design'
Author(s)
Klaus Kähler Holst
Estimation of mean clinical outcome truncated by event process
Description
Let Y denote the clinical outcome, A the binary treatment
variable, X baseline covariates, T the failure time,
and epsilon=1,2 the cause of failure.
The following are our two target parameters
E(Y|T>t, A=1)- E(Y|T>t, A=0)
P(T<t,\epsilon=1|A=1)- P(T<t,\epsilon=1|A=0)
Usage
estimate_truncatedscore(
data,
mod.y,
mod.r,
mod.a,
mod.event,
time,
cause = NULL,
cens.code = 0,
naive = FALSE,
control = list(),
...
)
Arguments
data |
(data.frame) |
mod.y |
(formula or learner) Model for clinical outcome given T>time. Using a formula specifies a glm with an identity link (see example). |
mod.r |
(formula or learner) Model for missing data mechanism for clinical outcome at T=time. Using a formula specifies a glm with a log link. |
mod.a |
(formula or learner) Treatment model (in RCT should just be 'a ~ 1'). Using a formula specifies a glm with a log link. |
mod.event |
(formula) Model for time-to-event process ('Event(time,status) ~ x'). |
time |
(numeric) Landmark time. |
cause |
(integer) Primary event (in the 'status' variable of the 'Event' statement). |
cens.code |
(integer) Censoring code. |
naive |
(logical) If TRUE, the unadjusted estimates ignoring baseline covariates is returned as the attribute 'naive'. |
control |
(list) optimization routine parameters. |
... |
Additional arguments passed to mets::binregATE. |
Value
lava::estimate.default object
Author(s)
Klaus Kähler Holst
Examples
data(truncatedscore)
mod1 <- learner_glm(y ~ a * (x1 + x2))
mod2 <- learner_glm(r ~ a * (x1 + x2), family = binomial)
a <- estimate_truncatedscore(
data = truncatedscore,
mod.y = mod1,
mod.r = mod2,
mod.a = a ~ 1,
mod.event = mets::Event(time, status) ~ x1+x2,
time = 2
)
s <- summary(a, noninf.t = -0.1)
print(s)
parameter(s)
# the above is equivalent to
# a <- estimate_truncatedscore(
# data = truncatedscore,
# mod.y = y ~ a * (x1 + x2),
# mod.r = r ~ a * (x1 + x2),
# mod.a = a ~ 1,
# mod.event = mets::Event(time, status) ~ x1+x2,
# time = 2
# )
Create a list from all combination of input variables
Description
Similar to expand.grid function, this function creates all combinations
of the input arguments but returns the result as a list.
Usage
expand.list(..., INPUT = NULL, envir = NULL)
Arguments
... |
input variables |
INPUT |
optional list of variables |
envir |
environment environment to evalute formulas in |
Value
list
Author(s)
Klaus Kähler Holst
Examples
expand.list(x = 2:4, z = c("a", "b"))
Integral approximation of a time dependent function.
Computes an approximation of \int_start^stop S(t) dt, where
S(t) is a survival function, for a selection of start and stop time
points.
Description
Integral approximation of a time dependent function.
Computes an approximation of \int_start^stop S(t) dt, where
S(t) is a survival function, for a selection of start and stop time
points.
Usage
int_surv(times, surv, start = 0, stop = max(times), extend = FALSE)
Arguments
times |
Numeric vector, sorted time points. |
surv |
Numeric vector, values of a survival function evaluated at time
points given by |
start |
Numeric vector, start of the integral. |
stop |
Numeric vector, end of the integral. |
extend |
(logical) If TRUE, integral is extended beyond the last observed time point |
Value
Numeric vector, value of the integral.
Author(s)
Andreas Nordland
R6 class for prediction models
Description
Interface for statistical and machine learning models to be used for nuisance model estimation in targeted learning.
The following list provides an overview of constructors for many commonly used models.
Regression and classification: learner_glm, learner_gam, learner_grf,
learner_hal, learner_glmnet_cv, learner_svm, learner_xgboost,
learner_mars
Regression: learner_isoreg
Classification: learner_naivebayes
Ensemble (super learner): learner_sl
Public fields
infoOptional information/name of the model
Active bindings
clearRemove fitted model from the learner object
fitReturn estimated model object.
formulaReturn model formula. Use learner$update() to update the formula.
Methods
Public methods
Method new()
Create a new prediction model object
Usage
learner$new( formula = NULL, estimate, predict = stats::predict, predict.args = NULL, estimate.args = NULL, info = NULL, specials = c(), formula.keep.specials = FALSE, intercept = FALSE )
Arguments
formulaformula specifying outcome and design matrix
estimatefunction for fitting the model. This must be a function with response, 'y', and design matrix, 'x'. Alternatively, a function with a formula and data argument. See the examples section.
predictprediction function (must be a function of model object, 'object', and new design matrix, 'newdata')
predict.argsoptional arguments to prediction function
estimate.argsoptional arguments to estimate function
infooptional description of the model
specialsoptional specials terms (weights, offset, id, subset, ...) passed on to design
formula.keep.specialsif TRUE then special terms defined by
specialswill be removed from the formula before it is being passed to the estimate print.function()intercept(logical) include intercept in design matrix
Method estimate()
Estimation method
Usage
learner$estimate(data, ..., store = TRUE)
Arguments
datadata.frame
...Additional arguments to estimation method
storeLogical determining if estimated model should be stored inside the class.
Method predict()
Prediction method
Usage
learner$predict(newdata, ..., object = NULL)
Arguments
newdatadata.frame
...Additional arguments to prediction method
objectOptional model fit object
Method update()
Update formula
Usage
learner$update(formula)
Arguments
formulaformula or character which defines the new response
Method print()
Print method
Usage
learner$print()
Method summary()
Summary method to provide more extensive information than learner$print().
Usage
learner$summary()
Returns
summarized_learner object, which is a list with the following elements:
- info
description of the learner
- formula
formula specifying outcome and design matrix
- estimate
function for fitting the model
- estimate.args
arguments to estimate function
- predict
function for making predictions from fitted model
- predict.args
arguments to predict function
- specials
provided special terms
- intercept
include intercept in design matrix
Examples
lr <- learner_glm(y ~ x, family = "nb") lr$summary() lr_sum <- lr$summary() # store returned summary in new object names(lr_sum) print(lr_sum)
Method response()
Extract response from data
Usage
learner$response(data, eval = TRUE, ...)
Arguments
datadata.frame
evalwhen FALSE return the untransformed outcome (i.e., return 'a' if formula defined as I(a==1) ~ ...)
...additional arguments to design
Method design()
Generate design object (design matrix and response) from data
Usage
learner$design(data, ...)
Arguments
datadata.frame
...additional arguments to design
Method opt()
Get options
Usage
learner$opt(arg)
Arguments
argname of option to get value of
Method clone()
The objects of this class are cloneable with this method.
Usage
learner$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Author(s)
Klaus Kähler Holst, Benedikt Sommer
Examples
data(iris)
rf <- function(formula, ...) {
learner$new(formula,
info = "grf::probability_forest",
estimate = function(x, y, ...) {
grf::probability_forest(X = x, Y = y, ...)
},
predict = function(object, newdata) {
predict(object, newdata)$predictions
},
estimate.args = list(...)
)
}
args <- expand.list(
num.trees = c(100, 200), mtry = 1:3,
formula = c(Species ~ ., Species ~ Sepal.Length + Sepal.Width)
)
models <- lapply(args, function(par) do.call(rf, par))
x <- models[[1]]$clone()
x$estimate(iris)
predict(x, newdata = head(iris))
# Reduce Ex. timing
a <- targeted::cv(models, data = iris)
cbind(coef(a), attr(args, "table"))
# defining learner via function with arguments y (response)
# and x (design matrix)
f1 <- learner$new(
estimate = function(y, x) lm.fit(x = x, y = y),
predict = function(object, newdata) newdata %*% object$coefficients
)
# defining the learner via arguments formula and data
f2 <- learner$new(
estimate = function(formula, data, ...) glm(formula, data, ...)
)
# generic learner defined from function (predict method derived per default
# from stats::predict
f3 <- learner$new(
estimate = function(dt, ...) {
lm(y ~ x, data = dt)
}
)
## ------------------------------------------------
## Method `learner$summary`
## ------------------------------------------------
lr <- learner_glm(y ~ x, family = "nb")
lr$summary()
lr_sum <- lr$summary() # store returned summary in new object
names(lr_sum)
print(lr_sum)
Construct learners from a grid of parameters
Description
Construct learners from a grid of parameters
Usage
learner_expand_grid(fun, args, names = TRUE, params = FALSE)
Arguments
fun |
(function) A function that returns a learner. |
args |
(list) Parameters that generate a grid of parameters with
expand.list, where the set of parameters are then passed on to |
names |
(logical or character) If FALSE, then return a list without names. If TRUE, a named list is returned (see details). |
params |
(logical) If FALSE, then no information about the parameters
defined by |
Value
list
Examples
lrs <- learner_expand_grid(
learner_xgboost,
list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3))
)
lrs # use info of constructed learner as names
lrs <- learner_expand_grid(
learner_xgboost,
list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)),
names = "xgboost"
)
names(lrs) # use xgboost instead of info field for names
lrs <- learner_expand_grid(
learner_xgboost,
list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)),
names = "xgboost",
params = TRUE
)
names(lrs) # also add parameters to names
lrs <- learner_expand_grid(
learner_xgboost,
list(formula = Sepal.Length ~ ., eta = c(0.2, 0.5, 0.3)),
names = FALSE
)
names(lrs) # unnamed list since names = FALSE
Construct a learner
Description
Constructs learner class object for fitting generalized additive models with mgcv::gam.
Usage
learner_gam(
formula,
info = "mgcv::gam",
family = gaussian(),
select = FALSE,
gamma = 1,
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
family |
This is a family object specifying the distribution and link to use in
fitting etc (see |
select |
If this is |
gamma |
Increase this beyond 1 to produce smoother models. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to mgcv::gam. |
Value
learner object.
Examples
n <- 5e2
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
d0 <- data.frame(y, x1, x2)
lr <- learner_gam(y ~ s(x1) + x2)
lr$estimate(d0)
if (interactive()) {
plot(lr$fit)
}
Construct a learner
Description
Constructs a learner class object for fitting generalized
linear models with stats::glm and MASS::glm.nb. Negative binomial
regression is supported with family = "nb" (or alternatively family = "negbin").
Usage
learner_glm(
formula,
info = "glm",
family = gaussian(),
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
family |
a description of the error distribution and link
function to be used in the model. For |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to stats::glm or MASS::glm.nb. |
Value
learner object.
Examples
n <- 5e2
x <- rnorm(n)
w <- 50 + rexp(n, rate = 1 / 5)
y <- rpois(n, exp(2 + 0.5 * x + log(w)) * rgamma(n, 1 / 2, 1 / 2))
d0 <- data.frame(y, x, w)
lr <- learner_glm(y ~ x) # linear Gaussian model
lr$estimate(d0)
coef(lr$fit)
# negative binomial regression model with offset (using MASS::glm.nb)
lr <- learner_glm(y ~ x + offset(log(w)), family = "nb")
lr$estimate(d0)
coef(lr$fit)
lr$predict(data.frame(x = 1, w = c(1, 5))) # response scale
lr$predict(data.frame(x = 1, w = c(1, 5)), type = "link") # link scale
Construct a learner
Description
Constructs a learner class object for fitting entire lasso or
elastic-net regularization paths for various linear and non-linear regression
models with glmnet::cv.glmnet. Predictions are returned for the value of
lambda that gives minimum cvm. That is, glmnet::predict.cv.glmnet is
called with s = "lambda.min".
Usage
learner_glmnet_cv(
formula,
info = "glmnet::cv.glmnet",
family = gaussian(),
lambda = NULL,
alpha = 1,
nfolds = 10,
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
family |
Either a character string representing
one of the built-in families, or else a |
lambda |
Optional user-supplied lambda sequence; default is
|
alpha |
The elasticnet mixing parameter, with
|
nfolds |
number of folds - default is 10. Although |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Other arguments that can be passed to glmnet, for example |
Value
learner object.
Examples
# continuous outcome
n <- 5e2
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
lp <- x1 + x2*x1 + cos(x1)
y <- rnorm(n, lp, sd = 2)
d0 <- data.frame(y, x1, x2)
lr <- learner_glmnet_cv(y ~ x1 + x2)
lr$estimate(d0, nfolds = 3)
lr$predict(data.frame(x1 = c(0, 1), x2 = 1))
# count outcome with different exposure time
w <- 50 + rexp(n, rate = 1 / 5)
y <- rpois(n, exp(0.5 * x1 - 1 * x2 + log(w)) * rgamma(n, 1 / 2, 1 / 2))
d0 <- data.frame(y, x1, x2, w)
lr <- learner_glmnet_cv(y ~ x1 + x2 + offset(log(w)), family = "poisson")
lr$estimate(d0, nfolds = 3)
lr$predict(data.frame(x1 = 1, x2 = 1, w = c(1, 5)))
Construct a learner
Description
Constructs a learner class object for fitting generalized
random forest models with grf::regression_forest or
grf::probability_forest. As shown in the examples, the constructed learner
returns predicted class probabilities of class 2 in case of binary
classification. A n times p matrix, with n being the number of
observations and p the number of classes, is returned for multi-class
classification.
Usage
learner_grf(
formula,
num.trees = 2000,
min.node.size = 5,
alpha = 0.05,
sample.fraction = 0.5,
num.threads = 1,
model = "grf::regression_forest",
info = model,
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
num.trees |
Number of trees grown in the forest. Note: Getting accurate confidence intervals generally requires more trees than getting accurate predictions. Default is 2000. |
min.node.size |
A target for the minimum number of observations in each tree leaf. Note that nodes with size smaller than min.node.size can occur, as in the original randomForest package. Default is 5. |
alpha |
A tuning parameter that controls the maximum imbalance of a split. Default is 0.05. |
sample.fraction |
Fraction of the data used to build each tree. Note: If honesty = TRUE, these subsamples will further be cut by a factor of honesty.fraction. Default is 0.5. |
num.threads |
Number of threads used in training. By default, the number of threads is set to the maximum hardware concurrency. |
model |
(character) grf model to estimate. Usually regression_forest (grf::regression_forest) or probability_forest (grf::probability_forest). |
info |
(character) Optional information to describe the instantiated learner object. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to |
Value
learner object.
Examples
n <- 5e2
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
lp <- x2*x1 + cos(x1)
yb <- rbinom(n, 1, lava::expit(lp))
y <- lp + rnorm(n, sd = 0.5**.5)
d <- data.frame(y, yb, x1, x2)
# regression
lr <- learner_grf(y ~ x1 + x2)
lr$estimate(d)
lr$predict(head(d))
# binary classification
lr <- learner_grf(as.factor(yb) ~ x1 + x2, model = "probability_forest")
lr$estimate(d)
lr$predict(head(d)) # predict class probabilities of class 2
# multi-class classification
lr <- learner_grf(Species ~ ., model = "probability_forest")
lr$estimate(iris)
lr$predict(head(iris))
Construct a learner
Description
Constructs a learner class object for fitting a highly adaptive lasso model with hal9001::fit_hal.
Usage
learner_hal(
formula,
info = "hal9001::fit_hal",
smoothness_orders = 0,
reduce_basis = NULL,
family = "gaussian",
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
smoothness_orders |
An |
reduce_basis |
Am optional |
family |
A |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to hal9001::fit_hal. |
Value
learner object.
Examples
## Not run:
n <- 5e2
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
d <- data.frame(y, x1, x2)
lr <- learner_hal(y ~ x1 + x2, smoothness_orders = 0.5, reduce_basis = 1)
lr$estimate(d)
lr$predict(data.frame(x1 = 0, x2 = c(-1, 1)))
## End(Not run)
Construct a learner
Description
Constructs a learner class object for isotonic regression with isoregw.
Usage
learner_isoreg(formula, info = "targeted::isoregw", learner.args = NULL, ...)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to isoregw. |
Value
learner object.
Examples
x <- runif(5e3, -5, 5)
pr <- lava::expit(-1 + x)
y <- rbinom(length(pr), 1, pr)
d <- data.frame(y, x)
lr <- learner_isoreg(y ~ x)
lr$estimate(d)
pr_iso <- lr$predict(d)
if (interactive()) {
plot(pr ~ x, cex=0.3)
lines(sort(x), pr_iso[order(x)], col="red", type="s")
}
Construct a learner
Description
Constructs a learner class object for fitting multivariate adaptive regression splines with earth::earth.
Usage
learner_mars(
formula,
info = "earth::earth",
degree = 1,
nprune = NULL,
glm = NULL,
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
degree |
Maximum degree of interaction (Friedman's |
nprune |
Maximum number of terms (including intercept) in the pruned model.
Default is NULL, meaning all terms created by the forward pass
(but typically not all terms will remain after pruning).
Use this to enforce an upper bound on the model size (that is less than |
glm |
NULL (default) or a list of arguments to pass on to |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to earth::earth. |
Value
learner object.
Examples
# poisson regression
n <- 5e2
x <- rnorm(n)
w <- 50 + rexp(n, rate = 1 / 5)
y <- rpois(n, exp(2 + 0.5 * x + log(w)) * rgamma(n, 1 / 2, 1 / 2))
d0 <- data.frame(y, x, w)
lr <- learner_mars(y ~ x + offset(log(w)), degree = 2,
glm = list(family = poisson())
)
lr$estimate(d0)
lr$predict(data.frame(x = 0, w = c(1, 2)))
Construct a learner
Description
Constructs a learner class object for fitting a naive bayes
classifier with naivebayes. As shown in the examples, the constructed
learner returns predicted class probabilities of class 2 in case of binary
classification. A n times p matrix, with n being the number of
observations and p the number of classes, is returned for multi-class
classification.
Usage
learner_naivebayes(
formula,
info = "Naive Bayes",
laplace.smooth = 0,
kernel = FALSE,
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
laplace.smooth |
Laplace smoothing |
kernel |
If TRUE a kernel estimator is used for numeric predictors (otherwise a gaussian model is used) |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to naivebayes. |
Value
learner object.
Examples
n <- 5e2
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
y <- rbinom(n, 1, lava::expit(x2*x1 + cos(x1)))
d <- data.frame(y, x1, x2)
# binary classification
lr <- learner_naivebayes(y ~ x1 + x2)
lr$estimate(d)
lr$predict(head(d))
# multi-class classification
lr <- learner_naivebayes(Species ~ .)
lr$estimate(iris)
lr$predict(head(iris))
Construct a learner
Description
Constructs a learner class object for fitting a superlearner.
Usage
learner_sl(
learners,
info = NULL,
nfolds = 5L,
meta.learner = metalearner_nnls,
model.score = mse,
learner.args = NULL,
...
)
Arguments
learners |
(list) List of learner objects (i.e. learner_glm) |
info |
(character) Optional information to describe the instantiated learner object. |
nfolds |
(integer) Number of folds to use in cross-validation to estimate the ensemble weights. |
meta.learner |
(function) Algorithm to learn the ensemble weights
(default non-negative least squares). Must be a function of the response
(nx1 vector), |
model.score |
(function) Model scoring method (see learner) |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to superlearner |
Value
learner object.
See Also
Examples
sim1 <- function(n = 5e2) {
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
data.frame(y, x1, x2)
}
d <- sim1()
m <- list(
"mean" = learner_glm(y ~ 1),
"glm" = learner_glm(y ~ x1 + x2),
"iso" = learner_isoreg(y ~ x1)
)
s <- learner_sl(m, nfolds = 10)
s$estimate(d)
pr <- s$predict(d)
if (interactive()) {
plot(y ~ x1, data = d)
points(d$x1, pr, col = 2, cex = 0.5)
lines(cos(x1) + x1 ~ x1, data = d[order(d$x1), ],
lwd = 4, col = lava::Col("darkblue", 0.3))
}
print(s)
# weights(s$fit)
# score(s$fit)
cvres <- cv(s, data = d, nfolds = 3, rep = 2)
cvres
# coef(cvres)
# score(cvres)
Construct stratified learner
Description
This function creates a stratified learner from an existing
learner wrapper function such as learner_glm or learner_xgboost. The
stratification variable can be specified either using the stratify
argument (which can be given as a string "a" or a formula , for example ~
I(a==0)), or it can be defined as a special term directly in the formula, y
~ ... + stratify(a). The formula will subsequently be passed to the
learner_ wrapper without the stratify special term.
Usage
learner_stratify(
formula,
learner,
stratify = NULL,
info = NULL,
learner.args = list(),
...
)
Arguments
formula |
formula specifying outcome and design matrix |
learner |
(learner) learner object |
stratify |
(character,formula) variables to stratify by |
info |
optional description of the model |
learner.args |
(list) optional arguments to the learner constructor |
... |
additional arguments passed to the learner constructor |
Value
learner object
Examples
simdata <- function(n=1000) {
a <- rbinom(n, 1, 0.5)
x <- rnorm(n)
y <- rbinom(n, 1, plogis(-1 + a + a * x))
data.frame(y, a, x)
}
d <- simdata()
lr <- learner_stratify(
y ~ x + stratify(a),
learner_glm,
family=binomial()
)
lr$estimate(d)
lr$predict(head(d))
Construct a learner
Description
Constructs a learner class object for fitting support vector
machines with e1071::svm. As shown in the examples, the constructed learner
returns predicted class probabilities of class 2 in case of binary
classification. A n times p matrix, with n being the number of
observations and p the number of classes, is returned for multi-class
classification.
Usage
learner_svm(
formula,
info = "e1071::svm",
cost = 1,
epsilon = 0.1,
kernel = "radial",
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
info |
(character) Optional information to describe the instantiated learner object. |
cost |
cost of constraints violation (default: 1)—it is the ‘C’-constant of the regularization term in the Lagrange formulation. |
epsilon |
epsilon in the insensitive-loss function (default: 0.1) |
kernel |
the kernel used in training and predicting. You
might consider changing some of the following parameters, depending
on the kernel type.
|
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to e1071::svm. |
Value
learner object.
Examples
n <- 5e2
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
lp <- x2*x1 + cos(x1)
yb <- rbinom(n, 1, lava::expit(lp))
y <- lp + rnorm(n, sd = 0.5**.5)
d <- data.frame(y, yb, x1, x2)
# regression
lr <- learner_svm(y ~ x1 + x2)
lr$estimate(d)
lr$predict(head(d))
# binary classification
lr <- learner_svm(as.factor(yb) ~ x1 + x2)
# alternative to transforming response variable to factor
# lr <- learner_svm(yb ~ x1 + x2, type = "C-classification")
lr$estimate(d)
lr$predict(head(d)) # predict class probabilities of class 2
lr$predict(head(d), probability = FALSE) # predict labels
# multi-class classification
lr <- learner_svm(Species ~ .)
lr$estimate(iris)
lr$predict(head(iris))
Construct a learner
Description
Constructs a learner class object for xgboost::xgboost.
Usage
learner_xgboost(
formula,
max_depth = 2L,
eta = 1,
nrounds = 2L,
subsample = 1,
lambda = 1,
verbose = 0,
objective = "reg:squarederror",
info = paste("xgboost", objective),
learner.args = NULL,
...
)
Arguments
formula |
(formula) Formula specifying response and design matrix. |
max_depth |
(integer) Maximum depth of a tree. |
eta |
(numeric) Learning rate. |
nrounds |
max number of boosting iterations. |
subsample |
(numeric) Subsample ratio of the training instance. |
lambda |
(numeric) L2 regularization term on weights. |
verbose |
If 0, xgboost will stay silent. If 1, it will print information about performance.
If 2, some additional information will be printed out.
Note that setting |
objective |
(character) Specify the learning task and the corresponding learning objective. See xgboost::xgboost for all available options. |
info |
(character) Optional information to describe the instantiated learner object. |
learner.args |
(list) Additional arguments to learner$new(). |
... |
Additional arguments to xgboost::xgboost. |
Value
learner object.
Examples
n <- 1e3
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
lp <- x2*x1 + cos(x1)
yb <- rbinom(n, 1, lava::expit(lp))
y <- lp + rnorm(n, sd = 0.5**.5)
d0 <- data.frame(y, yb, x1, x2)
# regression
lr <- learner_xgboost(y ~ x1 + x2, nrounds = 5)
lr$estimate(d0)
lr$predict(head(d0))
# binary classification
lr <- learner_xgboost(yb ~ x1 + x2, nrounds = 5,
objective = "binary:logistic"
)
lr$estimate(d0)
lr$predict(head(d0))
# multi-class classification
d0 <- iris
d0$y <- as.numeric(d0$Species)- 1
lr <- learner_xgboost(y ~ ., objective = "multi:softprob", num_class = 3)
lr$estimate(d0)
lr$predict(head(d0))
R6 class for prediction models
Description
Replaced by learner
Super class
targeted::learner -> ml_model
Methods
Public methods
Inherited methods
Method new()
Create a new prediction model object
Usage
ml_model$new(...)
Arguments
...deprecated
Method clone()
The objects of this class are cloneable with this method.
Usage
ml_model$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Naive Bayes classifier
Description
Naive Bayes Classifier
Usage
naivebayes(
formula,
data,
weights = NULL,
kernel = FALSE,
laplace.smooth = 0,
prior = NULL,
...
)
Arguments
formula |
Formula with syntax: response ~ predictors | weights |
data |
data.frame |
weights |
optional frequency weights |
kernel |
If TRUE a kernel estimator is used for numeric predictors (otherwise a gaussian model is used) |
laplace.smooth |
Laplace smoothing |
prior |
optional prior probabilities (default estimated from data) |
... |
additional arguments to lower level functions |
Value
An object of class 'naivebayes' is returned. See
naivebayes-class for more details about this class and
its generic functions.
Author(s)
Klaus K. Holst
Examples
library(data.table)
data(iris)
m <- naivebayes(Species ~ Sepal.Width + Petal.Length, data = iris)
pr <- predict(m, newdata = iris)
# using weights to reduce the size of the dataset
n <- 5e2
x <- rnorm(n, sd = 2) > 0
y <- rbinom(n, 1, lava::expit(x))
# full data set
d1 <- data.frame(y, x = as.factor(x > 0))
m1 <- naivebayes(y ~ x, data = d1)
# reduced data set
d2 <- data.table(d1)[, .(.N), by = .(y, x)]
m2 <- naivebayes(y ~ x, data = d2, weights = d2$N)
all(predict(m1, d1) == predict(m2, d1))
naivebayes class object
Description
The functions naivebayes returns an object of the type
naivebayes.
An object of class 'naivebayes' is a list with at least
the following components:
- prior
Matrix with prior probabilities, i.e. marginal class probabilities Pr(class)
- pcond
list of matrices with conditional probabilities of the features given the classes (one list element per class), Pr(x|class)
- classes
Names (character vector) of the classes
- xvar
Names of predictors
- xmodel
Conditional model for each predictor
- design
Model design object
- call
The function call which instantiated the object
Value
objects of the S3 class 'naivebayes'
S3 generics
The following S3 generic functions are available for an object
of class naivebayes:
predictPredict class probabilities for new features data.
printBasic print method.
See Also
Examples
## See example(naivebayes) for examples
Find non-dominated points of a set
Description
Find the non-dominated point of a set (minima of a point set).
Usage
nondom(x, ...)
Arguments
x |
matrix |
... |
additional arguments to lower level functions |
Details
A point x dominates y if it is never worse and at least in one case strictly better. Formally, let f_i denote the ith coordinate of the condition (objective) function, then for all i: f_i(x)<=f_i(y) and there exists j: f_j(x)<f_j(y).
Based on the algorithm of Kung et al. 1975.
Value
matrix
Author(s)
Klaus Kähler Holst
Examples
rbind(
c(1.0, 0.5),
c(0.0, 1.0),
c(1.0, 0.0),
c(0.5, 1.0),
c(1.0, 1.0),
c(0.8, 0.8)) |> nondom()
Pooled Adjacent Violators Algorithm
Description
Pooled Adjacent Violators Algorithm
Usage
pava(y, x = numeric(0), weights = numeric(0))
Arguments
y |
response variable |
x |
(optional) predictor vector (otherwise y is assumed to be a priori sorted according to relevant predictor) |
weights |
weights (optional) weights |
Value
List with index (idx) of jump points and values (value) at each jump point.
Author(s)
Klaus K. Holst
Examples
x <- runif(5e3, -5, 5)
pr <- lava::expit(-1 + x)
y <- rbinom(length(pr), 1, pr)
pv <- pava(y, x)
plot(pr ~ x, cex=0.3)
with(pv, lines(sort(x)[index], value, col="red", type="s"))
Prediction for kernel density estimates
Description
Kernel density estimator predictions
Usage
## S3 method for class 'density'
predict(object, xnew, ...)
Arguments
object |
density object |
xnew |
New data on which to make predictions for |
... |
additional arguments to lower level functions |
Author(s)
Klaus K. Holst
Predictions for Naive Bayes Classifier
Description
Naive Bayes Classifier predictions
Usage
## S3 method for class 'naivebayes'
predict(object, newdata, expectation = NULL, threshold = c(0.001, 0.001), ...)
Arguments
object |
density object |
newdata |
new data on which to make predictions |
expectation |
Variable to calculate conditional expectation wrt probabilities from naivebayes classifier |
threshold |
Threshold parameters. First element defines the threshold on the probabilities and the second element the value to set those truncated probabilities to. |
... |
Additional arguments to lower level functions |
Author(s)
Klaus K. Holst
Predict Method for superlearner Fits
Description
Obtains predictions for ensemble model or individual learners.
Usage
## S3 method for class 'superlearner'
predict(object, newdata, all.learners = FALSE, ...)
Arguments
object |
(superlearner) Fitted superlearner object. |
newdata |
(data.frame) Data in which to look for variables with which to predict. |
all.learners |
(logical) If FALSE (default), then return the predictions from the ensemble model. Otherwise, return predictions of from all individual learners. |
... |
Not used. |
Value
numeric (all.learners = FALSE) or matrix (all.learners = TRUE)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Risk regression
Description
Risk regression with binary exposure and nuisance model for the odds-product.
Let A be the binary exposure, V the set of covariates, and
Y the binary response variable, and define p_a(v) = P(Y=1 \mid
A=a, V=v), a\in\{0,1\}.
The target parameter is either the relative risk
\mathrm{RR}(v) = \frac{p_1(v)}{p_0(v)}
or the risk difference
\mathrm{RD}(v) = p_1(v)-p_0(v)
We assume a target parameter model given by either
\log\{RR(v)\} =
\alpha^t v
or
\mathrm{arctanh}\{RD(v)\} = \alpha^t
v
and similarly a working linear nuisance model for the odds-product
\phi(v) =
\log\left(\frac{p_{0}(v)p_{1}(v)}{(1-p_{0}(v))(1-p_{1}(v))}\right) = \beta^t
v
.
A propensity model for E(A=1|V) is also fitted using a logistic
regression working model
\mathrm{logit}\{E(A=1\mid V=v)\} = \gamma^t
v.
If both the odds-product model and the propensity model are correct the estimator is efficient. Further, the estimator is consistent in the union model, i.e., the estimator is double-robust in the sense that only one of the two models needs to be correctly specified to get a consistent estimate.
Usage
riskreg(
formula,
nuisance = ~1,
propensity = ~1,
target = ~1,
data,
weights,
type = "rr",
optimal = TRUE,
std.err = TRUE,
start = NULL,
mle = FALSE,
...
)
Arguments
formula |
formula (see details below) |
nuisance |
nuisance model (formula) |
propensity |
propensity model (formula) |
target |
(optional) target model (formula) |
data |
data.frame |
weights |
optional weights |
type |
type of association measure (rd og rr) |
optimal |
If TRUE optimal weights are calculated |
std.err |
If TRUE standard errors are calculated |
start |
optional starting values |
mle |
Semi-parametric (double-robust) estimate or MLE (TRUE gives MLE) |
... |
additional arguments to unconstrained optimization routine (nlminb) |
Details
The 'formula' argument should be given as response ~ exposure
| target-formula | nuisance-formula or response ~ exposure | target
| nuisance | propensity
E.g., riskreg(y ~ a | 1 | x+z | x+z, data=...)
Alternatively, the model can specifed using the target, nuisance and
propensity arguments: riskreg(y ~ a, target=~1, nuisance=~x+z, ...)
The riskreg_fit function can be used with matrix inputs rather than
formulas.
Value
An object of class 'riskreg.targeted' is returned. See
targeted-class for more details about this class and its
generic functions.
Author(s)
Klaus K. Holst
References
Richardson, T. S., Robins, J. M., & Wang, L. (2017). On modeling and estimation for the relative risk and risk difference. Journal of the American Statistical Association, 112(519), 1121–1130. http://dx.doi.org/10.1080/01621459.2016.1192546
Examples
m <- lava::lvm(a[-2] ~ x,
z ~ 1,
lp.target[1] ~ 1,
lp.nuisance[-1] ~ 2*x) |>
lava::distribution(~a, value=lava::binomial.lvm("logit")) |>
lava::binomial.rr("y","a","lp.target","lp.nuisance")
d <- sim(m,5e2,seed=1)
I <- model.matrix(~1, d)
X <- model.matrix(~1+x, d)
with(d, riskreg_mle(y, a, I, X, type="rr"))
with(d, riskreg_fit(y, a, nuisance=X, propensity=I, type="rr"))
riskreg(y ~ a | 1, nuisance=~x , data=d, type="rr")
## Model with same design matrix for nuisance and propensity model:
with(d, riskreg_fit(y, a, nuisance=X, type="rr"))
## a <- riskreg(y ~ a, target=~z, nuisance=~x,
## propensity=~x, data=d, type="rr")
a <- riskreg(y ~ a | z, nuisance=~x, propensity=~x, data=d, type="rr")
a
predict(a, d[1:5,])
riskreg(y ~ a, nuisance=~x, data=d, type="rr", mle=TRUE)
Binary regression models with right censored outcomes
Description
Binary regression models with right censored outcomes
Usage
riskreg_cens(
response,
censoring,
treatment = NULL,
prediction = NULL,
data,
newdata,
tau,
type = "risk",
M = 1,
call.response = "phreg",
args.response = list(),
call.censoring = "phreg",
args.censoring = list(),
preprocess = NULL,
efficient = TRUE,
control = list(),
...
)
Arguments
response |
Response formula (e.g., Surv(time, event) ~ D + W). |
censoring |
Censoring formula (e.g., Surv(time, event == 0) ~ D + A + W)). |
treatment |
Optional treatment model (learner) |
prediction |
Optional prediction model (learner) |
data |
data.frame. |
newdata |
Optional data.frame. In this case the uncentered influence function evaluated in 'newdata' is returned with nuisance parameters obtained from 'data'. |
tau |
Time-point of interest, see Details. |
type |
"risk", "treatment", "brier" |
M |
Number of folds in cross-fitting (M=1 is no cross-fitting). |
call.response |
Model call for the response model (e.g. "mets::phreg"). |
args.response |
Additional arguments to the response model. |
call.censoring |
Similar to call.response. |
args.censoring |
Similar to args.response. |
preprocess |
(optional) Data pre-processing function. |
efficient |
If FALSE an IPCW estimator is returned |
control |
See details |
... |
Additional arguments to lower level data pre-processing functions. |
Details
The one-step estimator depends on the calculation of an integral
wrt. the martingale process corresponding to the counting process N(t) =
I(C>min(T,tau)). This can be decomposed into an integral wrt the counting
process, dN_c(t) and the compensator d\Lambda_c(t) where the
latter term can be computational intensive to calculate. Rather than
calculating this integral in all observed time points, we can make a
coarser evaluation which can be controlled by setting
control=(sample=N). With N=0 the (computational intensive)
standard evaluation is used.
Value
estimate object
Author(s)
Klaus K. Holst, Andreas Nordland
Extract average cross-validated score of individual learners
Description
Extract average cross-validated score of individual learners
Usage
## S3 method for class 'superlearner'
score(x, ...)
Arguments
x |
(superlearner) Fitted model. |
... |
Not used. |
Predictive model scoring
Description
Predictive model scoring
Usage
scoring(
response,
...,
type = "quantitative",
levels = NULL,
metrics = NULL,
weights = NULL,
names = NULL,
object = NULL,
newdata = NULL,
messages = 1
)
Arguments
response |
Observed response |
... |
model predictions (continuous predictions or class probabilities (matrices)) |
type |
continuous or categorical response (the latter is automatically chosen if response is a factor, otherwise a continuous response is assumed) |
levels |
(optional) unique levels in response variable |
metrics |
which metrics to report |
weights |
optional frequency weights |
names |
(optional) character vector of the model names in the output. If omitted these will be taken from the names of the ellipsis argument (...) |
object |
optional model object |
newdata |
(optional) data.frame on which to evaluate the model performance |
messages |
controls amount of messages/warnings (0: none) |
Value
Numeric matrix of dimension m x p, where m is the number of different models and p is the number of model metrics
Examples
data(iris)
set.seed(1)
dat <- lava::csplit(iris,2)
g1 <- naivebayes(Species ~ Sepal.Width + Petal.Length, data=dat[[1]])
g2 <- naivebayes(Species ~ Sepal.Width, data=dat[[1]])
pr1 <- predict(g1, newdata=dat[[2]], wide=TRUE)
pr2 <- predict(g2, newdata=dat[[2]], wide=TRUE)
table(colnames(pr1)[apply(pr1,1,which.max)], dat[[2]]$Species)
table(colnames(pr2)[apply(pr2,1,which.max)], dat[[2]]$Species)
scoring(dat[[2]]$Species, pr1=pr1, pr2=pr2)
## quantitative response:
scoring(response=1:10, prediction=rnorm(1:10))
Softmax transformation
Description
Softmax transformation
Usage
softmax(x, log = FALSE, ref = TRUE, ...)
Arguments
x |
Input matrix (e.g., linear predictors of multinomial logistic model) |
log |
Return on log-scale (default FALSE) |
ref |
Add reference level (add 0 column to x) |
... |
Additional arguments to lower level functions |
Value
Numeric matrix of dimension n x p, where n= nrow(x) and
p = ncol(x) + (ref==TRUE)
Solve ODE
Description
Solve ODE with Runge-Kutta method (RK4)
Usage
solve_ode(ode_ptr, input, init, par = 0)
Arguments
ode_ptr |
pointer (externalptr) to C++ function or an R function |
input |
Input matrix. 1st column specifies the time points |
init |
Initial conditions |
par |
Parameters defining the ODE (parsed to ode_ptr) |
Details
The external point should be created with the function
targeted::specify_ode.
Value
Matrix with solution
Author(s)
Klaus Kähler Holst
See Also
specify_ode
Examples
example(specify_ode)
Specify Ordinary Differential Equation (ODE)
Description
Define compiled code for ordinary differential equation.
Usage
specify_ode(code, fname = NULL, pname = c("dy", "x", "y", "p"))
Arguments
code |
string with the body of the function definition (see details) |
fname |
Optional name of the exported C++ function |
pname |
Vector of variable names (results, inputs, states, parameters) |
Details
The model (code) should be specified as the body of of C++ function.
The following variables are defined bye default
(see the argument pname)
- dy
Vector with derivatives, i.e. the rhs of the ODE (the result).
- x
Vector with the first element being the time, and the following elements additional exogenous input variables,
- y
Vector with the dependent variable
- p
Parameter vector
y'(t) = f_{p}(x(t), y(t))
All variables are treated as Armadillo (http://arma.sourceforge.net/)
vectors/matrices.
As an example consider the Lorenz Equations
\frac{dx_{t}}{dt} = \sigma(y_{t}-x_{t})
\frac{dy_{t}}{dt} = x_{t}(\rho-z_{t})-y_{t}
\frac{dz_{t}}{dt} = x_{t}y_{t}-\beta z_{t}
We can specify this model as
ode <- 'dy(0) = p(0)*(y(1)-y(0));
dy(1) = y(0)*(p(1)-y(2));
dy(2) = y(0)*y(1)-p(2)*y(2);'
dy <- specify_ode(ode)
As an example of model with exogenous inputs consider the following ODE:
y'(t) = \beta_{0} + \beta_{1}y(t) +
\beta_{2}y(t)x(t) + \beta_{3}x(t)\cdot t
This could be specified as
mod <- 'double t = x(0);
dy = p(0) + p(1)*y + p(2)*x(1)*y + p(3)*x(1)*t;'
dy <- specify_ode(mod)
Value
pointer (externalptr) to C++ function
Author(s)
Klaus Kähler Holst
See Also
solve_ode
Examples
ode <- paste0(
"dy(0) = p(0)*(y(1)-y(0));",
"dy(1) = y(0)*(p(1)-y(2));",
"dy(2) = y(0)*y(1)-p(2)*y(2);", collapse="\n"
)
# Reduce test time
dy <- specify_ode(ode)
tt <- seq(0, 100, length.out=2e4)
yy <- solve_ode(dy, input=tt, init=c(1, 1, 1), par=c(10, 28, 8/3))
Identify Stratification Variables
Description
This is a special function that identifies stratification variables when they appear on the right hand side of a formula.
Usage
stratify(..., na.group = FALSE, shortlabel, sep = ", ")
Arguments
... |
any number of variables. All must be the same length. |
na.group |
a logical variable, if |
shortlabel |
if |
sep |
the character used to separate groups, in the created label |
Details
When used outside of a coxph formula the result of the function
is essentially identical to the interaction function,
though the labels from strata are often more verbose.
Value
a new factor, whose levels are all possible combinations of the factors supplied as arguments.
See Also
survival::strata, learner_stratify, interaction
Examples
a <- factor(rep(1:3, 4), labels=c("low", "medium", "high"))
b <- factor(rep(1:4, 3))
levels(stratify(b))
levels(stratify(a, b, shortlabel=TRUE))
Superlearner (stacked/ensemble learner)
Description
This function creates a predictor object (class learner) from a list of existing learner objects. When estimating this model a stacked prediction will be created by weighting together the predictions of each of the initial learners The weights are learned using cross-validation.
Usage
superlearner(
learners,
data,
nfolds = 10,
meta.learner = metalearner_nnls,
model.score = mse,
mc.cores = NULL,
future.seed = TRUE,
silent = TRUE,
name.prefix = NULL,
...
)
Arguments
learners |
(list) List of learner objects (i.e. learner_glm) |
data |
(data.frame) Data containing the response variable and covariates. |
nfolds |
(integer) Number of folds to use in cross-validation to estimate the ensemble weights. |
meta.learner |
(function) Algorithm to learn the ensemble weights
(default non-negative least squares). Must be a function of the response
(nx1 vector), |
model.score |
(function) Model scoring method (see learner) |
mc.cores |
(integer) If not NULL, then parallel::mcmapply is used with
|
future.seed |
(logical or integer) Argument passed on to future.apply::future_lapply. If TRUE, then .Random.seed is used if it holds a L'Ecuyer-CMRG RNG seed, otherwise one is created randomly. |
silent |
(logical) Suppress all messages and progressbars |
name.prefix |
(character) Prefix used to name learner objects in
|
... |
Additional arguments to parallel::mclapply or future.apply::future_lapply. |
References
Luedtke & van der Laan (2016) Super-Learning of an Optimal Dynamic Treatment Rule, The International Journal of Biostatistics.
See Also
predict.superlearner weights.superlearner score.superlearner
Examples
sim1 <- function(n = 5e2) {
x1 <- rnorm(n, sd = 2)
x2 <- rnorm(n)
y <- x1 + cos(x1) + rnorm(n, sd = 0.5**.5)
data.frame(y, x1, x2)
}
m <- list(
"mean" = learner_glm(y ~ 1),
"glm" = learner_glm(y ~ x1 + x2)
)
sl <- superlearner(m, data = sim1(), nfolds = 2)
predict(sl, newdata = sim1(n = 5))
predict(sl, newdata = sim1(n = 5), all.learners = TRUE)
targeted class object
Description
The functions riskreg and ate
returns an object of the type targeted.
An object of class 'targeted' is a list with at least the
following components:
- estimate
An
estimateobject with the target parameter estimates (seeestimate.default)- opt
Object returned from the applied optimization routine
- npar
number of parameters of the model (target and nuisance)
- type
String describing the model
Value
objects of the S3 class 'targeted'
S3 generics
The following S3 generic functions are available for an object of
class targeted:
coefExtract target coefficients of the estimated model.
vcovExtract the variance-covariance matrix of the target parameters.
ICExtract the estimated influence function.
printPrint estimates of the target parameters.
summaryExtract information on both target parameeters and estimated nuisance model.
'
See Also
Examples
## See example(riskreg) for examples
Extract model component from design object
Description
Extract model component from design object
Usage
## S3 method for class 'design'
terms(x, specials, ...)
Arguments
x |
design object |
specials |
extract variables marked as special (e.g., "offset", "weights", ...) |
... |
Additional arguments to lower level functions |
Signed Wald intersection test
Description
Calculating test statistics and p-values for the signed Wald intersection test given by
SW = \inf_{\theta \in \cap_{i=1}^n H_i}
\{(\widehat{\theta}-\theta)^\top W\widehat{\Sigma}W
(\widehat{\theta}-\theta)\}
with individual hypotheses for each
coordinate of \theta given by H_i: \theta_j < \delta_j for some
non-inferiority margin \delta_j, j=1,\ldots,n.
Usage
test_intersection_sw(
par,
vcov,
noninf = NULL,
weights = NULL,
nsim.null = 10000,
index = NULL
)
Arguments
par |
(numeric) parameter estimates or |
vcov |
(matrix) asymptotic variance estimate |
noninf |
(numeric) non-inferiority margins |
weights |
(numeric) optional weights |
nsim.null |
number of sample used in Monte-Carlo simulation |
index |
subset of parameters to test |
Value
list with Wald
Author(s)
Klaus Kähler Holst, Christian Bressen Pipper
Examples
S <- matrix(c(1, 0.5, 0.5, 2), 2, 2)
thetahat <- c(0.5, -0.2)
test_intersection_sw(thetahat, S, nsim.null = 1e5)
test_intersection_sw(thetahat, S, weights = NULL)
## Not run: # only on 'lava' >= 1.8.2
e <- estimate(coef = thetahat, vcov = S, labels = c("p1", "p2"))
lava::closed_testing(e, test_intersection_sw, noninf = c(-0.1, -0.1)) |>
summary()
## End(Not run)
Scores truncated by death
Description
Simulated data inspired by the FLOW study (Perkovic 2024)...elt() The following variables are considered in this simulated data set
-
time: time of first event in years (first major irreversible kidney event or non-related death) -
status: event type at first major irreversible kidney event (=1), non-related death (=2), or right censoring (=0) -
y: clinical outcome measurement (eGFR) at landmark time (:=2) -
r: missing indicator fory(1 if observed, 0 if either t<2 or if the outcome was not measured for other reasons) -
a: binary treatment (1 := active, 0 := placebo) -
x1: covariate, clinical outcome at baseline (eGFR) -
x2: covariate, binary treatment usage indicator (1: SGLT2 treatment, 0: none).
The actual failure times and censoring times are also included
(failure.time, cens.time), and the full-data outcome (y0) given t>2.
Source
Simulated data
References
Perkovic, V., Tuttle, K. R., Rossing, P., Mahaffey, K. W., Mann, J. F., Bakris, G., Baeres, F. M., Idorn, T., Bosch-Traberg, H., Lausvig, N. L., and Pratley, R. (2024). Effects of semaglutide on chronic kidney disease in patients with type 2 diabetes. New England Journal of Medicine, 391(2):109–121.
Examples
data(truncatedscore)
Extract ensemble weights
Description
Extract ensemble weights
Usage
## S3 method for class 'superlearner'
weights(object, ...)
Arguments
object |
(superlearner) Fitted model. |
... |
Not used. |