Type: | Package |
Title: | Generalized Linear Models with Truncated Lasso Penalty |
Version: | 2.0.2 |
Date: | 2024-10-01 |
URL: | https://yuyangyy.com/glmtlp/ |
Depends: | R (≥ 3.5.0) |
Imports: | foreach, doParallel, ggplot2 |
Suggests: | rmarkdown, knitr, testthat (≥ 3.0.0) |
Description: | Extremely efficient procedures for fitting regularization path with l0, l1, and truncated lasso penalty for linear regression and logistic regression models. This version is a completely new version compared with our previous version, which was mainly based on R. New core algorithms are developed and are now written in C++ and highly optimized. |
Encoding: | UTF-8 |
License: | GPL-3 |
LazyData: | true |
Author: | Chunlin Li |
Maintainer: | Yu Yang <yuyang.stat@gmail.com> |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2024-10-02 14:56:35 UTC; yuyang |
Repository: | CRAN |
Date/Publication: | 2024-10-02 20:20:14 UTC |
glmtlp: A package for fitting a GLM with l0, l1, and tlp regularization.
Description
The package provides 3 penalties: l0, l1, and tlp and 3 distribution families: gaussian, binomial, and poisson.
Fit generalized linear models via penalized maximum likelihood. The
regularization path is computed for the l0, lasso, or truncated lasso
penalty at a grid of values for the regularization parameter lambda
or kappa
. Fits linear and logistic regression models.
Usage
glmtlp(
X,
y,
family = c("gaussian", "binomial"),
penalty = c("l0", "l1", "tlp"),
nlambda = ifelse(penalty == "l0", 50, 100),
lambda.min.ratio = ifelse(nobs < nvars, 0.05, 0.001),
lambda = NULL,
kappa = NULL,
tau = 0.3 * sqrt(log(nvars)/nobs),
delta = 2,
tol = 1e-04,
weights = NULL,
penalty.factor = rep(1, nvars),
standardize = FALSE,
dc.maxit = 20,
cd.maxit = 10000,
nr.maxit = 20,
...
)
Arguments
X |
Input matrix, of dimension |
y |
Response variable, of length |
family |
A character string representing one of the built-in families. See Details section below. |
penalty |
A character string representing one of the built-in penalties.
|
nlambda |
The number of |
lambda.min.ratio |
The smallest value for |
lambda |
A user-supplied |
kappa |
A user-supplied |
tau |
A tuning parameter used in the TLP-penalized regression models.
Default is |
delta |
A tuning parameter used in the coordinate majorization descent algorithm. See Yang, Y., & Zou, H. (2014) in the reference for more detail. |
tol |
Tolerance level for all iterative optimization algorithms. |
weights |
Observation weights. Default is 1 for each observation. |
penalty.factor |
Separate penalty factors applied to each coefficient, which allows for differential shrinkage. Default is 1 for all variables. |
standardize |
Logical. Whether or not standardize the input matrix
|
dc.maxit |
Maximum number of iterations for the DC (Difference of Convex Functions) programming; default is 20. |
cd.maxit |
Maximum number of iterations for the coordinate descent algorithm; default is 10^4. |
nr.maxit |
Maximum number of iterations for the Newton-Raphson method; default is 500. |
... |
Additional arguments. |
Details
The sequence of models indexed by lambda
(when penalty = c('l1', 'tlp')
)
or kappa
(when penalty = 'l0'
) is fit by the coordinate
descent algorithm.
The objective function for the "gaussian"
family is:
1/2 RSS/nobs + \lambda*penalty,
and for the other models it is:
-loglik/nobs + \lambda*penalty.
Also note that, for "gaussian"
, glmtlp
standardizes y to
have unit variance (using 1/(n-1) formula).
## Details on family
option
glmtlp
currently only supports built-in families, which are specified by a
character string. For all families, the returned object is a regularization
path for fitting the generalized linear regression models, by maximizing the
corresponding penalized log-likelihood. glmtlp(..., family="binomial")
fits a traditional logistic regression model for the log-odds.
## Details on penalty
option
The built-in penalties are specified by a character string. For l0
penalty, kappa
sequence is used for generating the regularization
path, while for l1
and tlp
penalty, lambda
sequence
is used for generating the regularization path.
Value
An object with S3 class "glmtlp"
.
beta |
a |
call |
the call that produces this object. |
family |
the distribution family used in the model fitting. |
intercept |
the intercept vector, of |
lambda |
the actual sequence of |
penalty |
the penalty type in the model fitting. |
penalty.factor |
the penalty factor for each coefficient used in the model fitting. |
tau |
the tuning parameter used in the model fitting, available when
|
glmtlp functions
'glmtlp()', 'cv.glmtlp()'
Author(s)
Maintainer: Yu Yang yuyang.stat@gmail.com (ORCID) [copyright holder]
Authors:
Chunlin Li chunlin@iastate.edu (ORCID) [copyright holder]
Chong Wu (ORCID) [copyright holder]
Other contributors:
Xiaotong Shen [thesis advisor, copyright holder]
Wei Pan [thesis advisor, copyright holder]
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang yang6367@umn.edu
References
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
See Also
Useful links:
print
, predict
, coef
and plot
methods,
and the cv.glmtlp
function.
Examples
# Gaussian
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
fit1 <- glmtlp(X, y, family = "gaussian", penalty = "l0")
fit2 <- glmtlp(X, y, family = "gaussian", penalty = "l1")
fit3 <- glmtlp(X, y, family = "gaussian", penalty = "tlp")
# Binomial
X <- matrix(rnorm(100 * 20), 100, 20)
y <- sample(c(0, 1), 100, replace = TRUE)
fit <- glmtlp(X, y, family = "binomial", penalty = "l1")
A simulated binomial data set.
Description
A data set simulated for illustrating logistic regression models. Generated by
gen.binomial.data(n = 200, p = 20, seed = 2021)
.
Usage
data(bin_data)
Format
A list with three elements: design matrix X
, response y
,
and the true coefficient vector beta
.
- X
design matrix
- y
response
- beta
the true coefficient vector
Examples
data("bin_data")
cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1")
plot(cv.fit)
Cross-validation for glmtlp
Description
Performs k-fold cross-validation for l0, l1, or TLP-penalized regression models
over a grid of values for the regularization parameter lambda
(if penalty="l0"
) or kappa
(if penalty="l0"
).
Usage
cv.glmtlp(X, y, ..., seed = NULL, nfolds = 10, obs.fold = NULL, ncores = 1)
Arguments
X |
input matrix, of dimension |
y |
response, of length nobs, as in |
... |
Other arguments that can be passed to |
seed |
the seed for reproduction purposes |
nfolds |
number of folds; default is 10. The smallest value allowable
is |
obs.fold |
an optional vector of values between 1 and |
ncores |
number of cores utilized; default is 1. If greater than 1,
then |
Details
The function calls glmtlp
nfolds
+1 times; the first call to get the
lambda
or kappa
sequence, and then the rest to compute
the fit with each of the folds omitted. The cross-validation error is based
on deviance (check here for more details). The error is accumulated over the
folds, and the average error and standard deviation is computed.
When family = "binomial"
, the fold assignment (if not provided by
the user) is generated in a stratified manner, where the ratio of 0/1 outcomes
are the same for each fold.
Value
an object of class "cv.glmtlp"
is returned, which is a list
with the ingredients of the cross-validation fit.
call |
the function call |
cv.mean |
The mean cross-validated error - a vector of length
|
cv.se |
estimate of standard error of |
fit |
a fitted glmtlp object for the full data. |
idx.min |
the index of the |
kappa |
the values of |
kappa.min |
the value of |
lambda |
the values of |
lambda.min |
value of |
null.dev |
null deviance of the model. |
obs.fold |
the fold id for each observation used in the CV. |
Author(s)
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang yang6367@umn.edu
References
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
See Also
glmtlp
and plot
, predict
, and coef
methods for "cv.glmtlp"
objects.
Examples
# Gaussian
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1", seed=2021)
# Binomial
X <- matrix(rnorm(100 * 20), 100, 20)
y <- sample(c(0,1), 100, replace = TRUE)
cv.fit <- cv.glmtlp(X, y, family = "binomial", penalty = "l1", seed=2021)
A simulated gaussian data set.
Description
A data set simulated for illustrating linear regression models. Generated by
gen.gaussian.data(n = 200, p = 20, seed = 2021)
.
Usage
data(gau_data)
Format
A list with five elements: design matrix X
, response y
,
correlation structure of the covariates Sigma
, true beta beta
,
and the noise level sigma
.
- X
design matrix
- y
response
- beta
true beta values
- sigma
the noise level
Examples
data("gau_data")
cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp")
plot(cv.fit)
Simulate a binomial data set
Description
Simulate a data set with binary response following the logistic regression model.
Usage
gen.binomial.data(n, p, rho = 0, kappa = 5, beta.type = 1, seed = 2021)
Arguments
n |
Sample size. |
p |
Number of covariates. |
rho |
The parameter defining the AR(1) correlation matrix. |
kappa |
The number of nonzero coefficients. |
beta.type |
Numeric indicator for choosing the beta type. For
|
seed |
The seed for reproducibility. Default is 2021. |
Value
A list containing the simulated data.
X |
the covariate matrix, of dimension |
y |
the response, of length |
beta |
the true coefficients, of length |
Examples
bin_data <- gen.binomial.data(n = 200, p = 20, seed = 2021)
head(bin_data$X)
head(bin_data$y)
head(bin_data$beta)
Simulate a gaussian data set
Description
Simulate a data set with gaussian response following the linear regression model.
Usage
gen.gaussian.data(
n,
p,
rho = 0,
kappa = 5,
beta.type = 1,
snr = 1,
seed = 2021
)
Arguments
n |
Sample size. |
p |
Number of covariates. |
rho |
The parameter defining the AR(1) correlation matrix. |
kappa |
The number of nonzero coefficients. |
beta.type |
Numeric indicator for choosing the beta type. For
|
snr |
Signal-to-noise ratio. Default is 1. |
seed |
The seed for reproducibility. Default is 2021. |
Value
A list containing the simulated data.
X |
the covariate matrix, of dimension |
y |
the response, of length |
beta |
the true coefficients, of length |
sigma |
the standard error of the noise. |
Examples
gau_data <- gen.gaussian.data(n = 200, p = 20, seed = 2021)
head(gau_data$X)
head(gau_data$y)
head(gau_data$beta)
gau_data$sigma
Plot Method for a "cv.glmtlp" Object
Description
Plots the cross-validation curve, and the upper and lower standard deviation
curves, as a function of the lambda
or kappa
values.
Usage
## S3 method for class 'cv.glmtlp'
plot(x, vertical.line = TRUE, ...)
Arguments
x |
Fitted |
vertical.line |
Logical. Whether or not include a vertical line indicating the position of the index which gives the smallest CV error. |
... |
Additional arguments. |
Details
The generated plot is a ggplot
object, and therefore, the users are able
to customize the plots following the ggplot2
syntax.
Author(s)
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang yang6367@umn.edu
References
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
Examples
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "tlp")
plot(cv.fit)
plot(cv.fit, vertical.line = FALSE)
cv.fit2 <- cv.glmtlp(X, y, family = "gaussian", penalty = "l0")
plot(cv.fit2)
plot(cv.fit2, vertical.line = FALSE)
data("gau_data")
cv.fit <- cv.glmtlp(gau_data$X, gau_data$y, family = "gaussian", penalty = "tlp")
plot(cv.fit)
data("bin_data")
cv.fit <- cv.glmtlp(bin_data$X, bin_data$y, family = "binomial", penalty = "l1")
plot(cv.fit)
Plot Method for a "glmtlp" Object
Description
Generates a solution path plot for a fitted "glmtlp"
object.
Usage
## S3 method for class 'glmtlp'
plot(
x,
xvar = c("lambda", "kappa", "deviance", "l1_norm", "log_lambda"),
xlab = iname,
ylab = "Coefficients",
title = "Solution Path",
label = FALSE,
label.size = 3,
...
)
Arguments
x |
Fitted |
xvar |
The x-axis variable to plot against, including |
xlab |
The x-axis label of the plot, default is |
ylab |
The y-axis label of the plot, default is "Coefficients". |
title |
The main title of the plot, default is "Solution Path". |
label |
Logical, whether or not attach the labels for the non-zero
coefficients, default is |
label.size |
The text size of the labels, default is 3. |
... |
Additional arguments. |
Details
The generated plot is a ggplot
object, and therefore, the users are able
to customize the plots following the ggplot2
syntax.
Value
A ggplot
object.
Author(s)
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang yang6367@umn.edu
References
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
See Also
print
, predict
, coef
and plot
methods,
and the cv.glmtlp
function.
Examples
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
fit <- glmtlp(X, y, family = "gaussian", penalty = "l1")
plot(fit, xvar = "lambda")
plot(fit, xvar = "log_lambda")
plot(fit, xvar = "l1_norm")
plot(fit, xvar = "log_lambda", label = TRUE)
fit2 <- glmtlp(X, y, family = "gaussian", penalty = "l0")
plot(fit2, xvar = "kappa", label = TRUE)
Predict Method for a "cv.glmtlp" Object.
Description
Makes predictions for a cross-validated glmtlp model, using
the stored "glmtlp"
object, and the optimal value chosen for
lambda
.
Usage
## S3 method for class 'cv.glmtlp'
predict(
object,
X,
type = c("link", "response", "class", "coefficients", "numnzs", "varnzs"),
lambda = NULL,
kappa = NULL,
which = object$idx.min,
...
)
## S3 method for class 'cv.glmtlp'
coef(object, lambda = NULL, kappa = NULL, which = object$idx.min, ...)
Arguments
object |
Fitted |
X |
X Matrix of new values for |
type |
Type of prediction to be made. For |
lambda |
Value of the penalty parameter |
kappa |
Value of the penalty parameter |
which |
Index of the penalty parameter |
... |
Additional arguments. |
Value
The object returned depends on type
.
Author(s)
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang yang6367@umn.edu
References
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
See Also
print
, predict
, coef
and plot
methods,
and the cv.glmtlp
function.
Examples
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
cv.fit <- cv.glmtlp(X, y, family = "gaussian", penalty = "l1")
predict(cv.fit, X = X[1:5, ])
coef(cv.fit)
predict(cv.fit, X = X[1:5, ], lambda = 0.1)
Predict Method for a "glmtlp" Object
Description
Predicts fitted values, logits, coefficients and more from a fitted
glmtlp
object.
Usage
## S3 method for class 'glmtlp'
predict(
object,
X,
type = c("link", "response", "class", "coefficients", "numnz", "varnz"),
lambda = NULL,
kappa = NULL,
which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))),
...
)
## S3 method for class 'glmtlp'
coef(
object,
lambda = NULL,
kappa = NULL,
which = 1:(ifelse(object$penalty == "l0", length(object$kappa), length(object$lambda))),
drop = TRUE,
...
)
Arguments
object |
Fitted |
X |
Matrix of new values for |
type |
Type of prediction to be made. For |
lambda |
Value of the penalty parameter |
kappa |
Value of the penalty parameter |
which |
Index of the penalty parameter |
... |
Additional arguments. |
drop |
Whether or not keep the dimension that is of length 1. |
Details
coef(...)
is equivalent to predict(type="coefficients",...)
Value
The object returned depends on type
.
Author(s)
Chunlin Li, Yu Yang, Chong Wu
Maintainer: Yu Yang yang6367@umn.edu
References
Shen, X., Pan, W., & Zhu, Y. (2012).
Likelihood-based selection and sharp parameter estimation.
Journal of the American Statistical Association, 107(497), 223-232.
Shen, X., Pan, W., Zhu, Y., & Zhou, H. (2013).
On constrained and regularized high-dimensional regression.
Annals of the Institute of Statistical Mathematics, 65(5), 807-832.
Li, C., Shen, X., & Pan, W. (2021).
Inference for a Large Directed Graphical Model with Interventions.
arXiv preprint arXiv:2110.03805.
Yang, Y., & Zou, H. (2014).
A coordinate majorization descent algorithm for l1 penalized learning.
Journal of Statistical Computation and Simulation, 84(1), 84-95.
Two R package Github: ncvreg and glmnet.
See Also
print
, predict
, coef
and plot
methods,
and the cv.glmtlp
function.
Examples
# Gaussian
X <- matrix(rnorm(100 * 20), 100, 20)
y <- rnorm(100)
fit <- glmtlp(X, y, family = "gaussian", penalty = "l1")
predict(fit, X = X[1:5, ])
coef(fit)
predict(fit, X = X[1:5, ], lambda = 0.1)
# Binomial
X <- matrix(rnorm(100 * 20), 100, 20)
y <- sample(c(0,1), 100, replace = TRUE)
fit <- glmtlp(X, y, family = "binomial", penalty = "l1")
coef(fit)
predict(fit, X = X[1:5, ], type = "response")
predict(fit, X = X[1:5, ], type = "response", lambda = 0.01)
predict(fit, X = X[1:5, ], type = "class", lambda = 0.01)
predict(fit, X = X[1:5, ], type = "numnz", lambda = 0.01)
Generate lambda sequence.
Description
Generate lambda sequence.
Usage
setup_lambda(X, y, weights, lambda.min.ratio, nlambda)
Arguments
X |
Input matrix, of dimension |
y |
Response variable, of length |
weights |
Observation weights. |
lambda.min.ratio |
The smallest value for |
nlambda |
The number of |