| Title: | Compositional Data Analysis for Microbiome Studies | 
| Version: | 0.2.4 | 
| Description: | Functions for microbiome data analysis that take into account its compositional nature. Performs variable selection through penalized regression for both, cross-sectional and longitudinal studies, and for binary and continuous outcomes. | 
| License: | MIT + file LICENSE | 
| URL: | https://malucalle.github.io/coda4microbiome/ | 
| BugReports: | https://github.com/malucalle/coda4microbiome/issues | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| LazyData: | false | 
| Imports: | corrplot, glmnet, plyr, pROC, ggpubr, ggplot2, ComplexHeatmap, circlize, survival, survminer | 
| Suggests: | rmarkdown | 
| NeedsCompilation: | no | 
| Packaged: | 2024-07-17 15:25:42 UTC; Toni | 
| Author: | Malu Calle | 
| Maintainer: | Toni Susin <toni.susin@upc.edu> | 
| Depends: | R (≥ 3.5.0) | 
| Repository: | CRAN | 
| Date/Publication: | 2024-07-17 15:50:02 UTC | 
Crohn
Description
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
Format
The dataset contains two objects:
- x_Crohn
- microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns) 
- y_Crohn
- a - factor, indicating if the sample corresponds to a case (CD) or a control (no).
References
doi:10.1016/j.chom.2014.02.005
data_survival
Description
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
Format
The dataset contains three objects:
- x
- microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns) 
- Event
- a - numeric, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.
- Event_time
- a - numeric, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
References
doi:10.1016/j.chom.2014.02.005
data_survival
Description
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
Format
The dataset contains three objects:
- x
- microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns) 
- Event
- a - numeric, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.
- Event_time
- a - numeric, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
References
doi:10.1016/j.chom.2014.02.005
HIV
Description
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
Format
The dataset contains three objects:
- x_HIV
- microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns) 
- y_HIV
- a factor, specifying if the individual is HIV positive or ( - Pos) or negative (- Neg).
- MSM_HIV
- a factor, indicating sexual preferences: - MSM(Men who have Sex with Men) or not (- nonMSM).
References
doi:10.1016/j.ebiom.2016.01.032
HIV
Description
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
Format
The dataset contains three objects:
- x_HIV
- microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns) 
- y_HIV
- a factor, specifying if the individual is HIV positive or ( - Pos) or negative (- Neg).
- MSM_HIV
- a factor, indicating sexual preferences: - MSM(Men who have Sex with Men) or not (- nonMSM).
References
doi:10.1016/j.ebiom.2016.01.032
coda4microbiome: Compositional Data Analysis for Microbiome Studies
Description
This package provides a set of functions to explore and study microbiome data within the CoDA framework, with a special focus on identification of microbial signatures (variable selection).
coda_coxnet
Description
Microbial signatures in survival studies The algorithm performs variable selection through an elastic-net penalized Cox regression conveniently adapted to CoDA. The result is expressed as the (weighted) balance between two groups of taxa. It allows the use of non-compositional covariates.
Usage
coda_coxnet(
  x,
  time,
  status,
  covar = NULL,
  lambda = "lambda.1se",
  nvar = NULL,
  alpha = 0.9,
  nfolds = 10,
  showPlots = TRUE,
  coef_threshold = 0
)
Arguments
| x | abundance matrix or data frame (rows are samples, columns are variables (taxa)) | 
| time | time to event or follow up time for right censored data. Must be a numericvector. | 
| status | event occurrence. Vector (type: numeric or logical) specifying 0, or FALSE, for no event occurrence, and 1, or TRUE, for event occurrence. | 
| covar | data frame with covariates (default = NULL) | 
| lambda | penalization parameter (default = "lambda.1se") | 
| nvar | number of variables to use in the glmnet.fit function (default = NULL) | 
| alpha | elastic net parameter (default = 0.9) | 
| nfolds | number of folds | 
| showPlots | if TRUE, shows the plots (default = TRUE) | 
| coef_threshold | coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) | 
Value
list with "taxa.num","taxa.name","log-contrast coefficients","risk.score","apparent Cindex","mean cv-Cindex","sd cv-Cindex","risk score plot","signature plot".
Author(s)
M. Calle, M. Pujolassos, T. Susin
Examples
data(data_survival, package = "coda4microbiome")
time <- Event_time
status <- Event
set.seed(12345)
coda_coxnet(x = x,
           time = Event_time,
           status = Event,
           covar = NULL,
           lambda = "lambda.1se", nvar = NULL,
           alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0)
coda_glmnet
Description
Microbial signatures in cross-sectional studies. The algorithm performs variable selection through penalized regression on the set of all pairwise log-ratios. The result is expressed as the (weighted) balance between two groups of taxa. It allows the use of non-compositional covariates.
Usage
coda_glmnet(
  x,
  y,
  covar = NULL,
  lambda = "lambda.1se",
  nvar = NULL,
  alpha = 0.9,
  nfolds = 10,
  showPlots = TRUE,
  coef_threshold = 0
)
Arguments
| x | abundance matrix or data frame (rows are samples, columns are variables (taxa)) | 
| y | outcome (binary or continuous); data type: numeric, character or factor vector | 
| covar | data frame with covariates (default = NULL) | 
| lambda | penalization parameter (default = "lambda.1se") | 
| nvar | number of variables to use in the glmnet.fit function (default = NULL) | 
| alpha | elastic net parameter (default = 0.9) | 
| nfolds | number of folds | 
| showPlots | if TRUE, shows the plots (default = TRUE) | 
| coef_threshold | coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) | 
Value
if y is binary: list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent AUC","mean cv-AUC","sd cv-AUC","predictions plot","signature plot" if not:list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent Rsq","mean cv-MSE","sd cv-MSE","predictions plot","signature plot"
Author(s)
M. Calle - T. Susin
Examples
data(Crohn, package = "coda4microbiome")
set.seed(123)
coda_glmnet(x_Crohn[,(1:10)],y_Crohn,showPlots= FALSE)
coda_glmnet0
Description
Internal function for the permutational test
Usage
coda_glmnet0(
  x,
  lrX,
  idlrX,
  nameslrX,
  y,
  covar = NULL,
  lambda = "lambda.1se",
  alpha = 0.9
)
Arguments
| x | . | 
| lrX | . | 
| idlrX | . | 
| nameslrX | . | 
| y | . | 
| covar | . | 
| lambda | . | 
| alpha | . | 
Value
.
Author(s)
M. Calle - T. Susin
coda_glmnet_longitudinal
Description
Microbial signatures in longitudinal studies. Identification of a set of microbial taxa whose joint dynamics is associated with the phenotype of interest. The algorithm performs variable selection through penalized regression over the summary of the log-ratio trajectories (AUC). The result is expressed as the (weighted) balance between two groups of taxa.
Usage
coda_glmnet_longitudinal(
  x,
  y,
  x_time,
  subject_id,
  ini_time,
  end_time,
  covar = NULL,
  lambda = "lambda.1se",
  nvar = NULL,
  alpha = 0.9,
  nfolds = 10,
  showPlots = TRUE,
  coef_threshold = 0
)
Arguments
| x | abundance matrix or data frame in long format (several rows per individual) | 
| y | outcome (binary); data type: numeric, character or factor vector | 
| x_time | observation times | 
| subject_id | subject id | 
| ini_time | initial time to be analyzed | 
| end_time | end time to be analyzed | 
| covar | data frame with covariates (default = NULL) | 
| lambda | penalization parameter (default = "lambda.1se") | 
| nvar | number of variables to use in the glmnet.fit function (default = NULL) | 
| alpha | elastic net parameter (default = 0.9) | 
| nfolds | number of folds (default = 10) | 
| showPlots | if TRUE, shows the plots (default = FALSE) | 
| coef_threshold | coefficient threshold, minimum absolute value of the coefficient for a variable to be included in the model (default =0) | 
Value
in case of binary outcome: list with "taxa.num","taxa.name","log-contrast coefficients","predictions","apparent AUC","mean cv-AUC","sd cv-AUC","predictions plot","signature plot","trajectories plot"
Author(s)
M. Calle - T. Susin
Examples
data(ecam_filtered, package = "coda4microbiome")   # load the data
ecam_results<-coda_glmnet_longitudinal (x=x_ecam[,(1:4)],y= metadata$diet,
x_time= metadata$day_of_life, subject_id = metadata$studyid, ini_time=0,
end_time=60,lambda="lambda.min",nfolds=4, showPlots=FALSE)
ecam_results$taxa.num
coda_glmnet_longitudinal0
Description
internal function
Usage
coda_glmnet_longitudinal0(
  x,
  lrX,
  idlrX,
  nameslrX,
  y,
  x_time,
  subject_id,
  ini_time,
  end_time,
  covar = NULL,
  ktop = NULL,
  lambda = "lambda.1se",
  alpha = 0.9,
  nfolds = 10
)
Arguments
| x | abundance matrix or data frame in long format (several rows per individual) | 
| lrX | log-ratio matrix | 
| idlrX | indices table in the log-ratio matrix | 
| nameslrX | colnames of the log-ratio matrix | 
| y | outcome (binary); data type: numeric, character or factor vector | 
| x_time | observation times | 
| subject_id | subject id | 
| ini_time | initial time to be analyzed | 
| end_time | end time to be analyzed | 
| covar | data frame with covariates (default = NULL) | 
| ktop | given number of selected taxa or compute the best number in case it is NULL (default = NULL) | 
| lambda | penalization parameter (default = "lambda.1se") | 
| alpha | elastic net parameter (default = 0.9) | 
| nfolds | number of folds | 
Value
.
Author(s)
M. Calle - T. Susin
coda_glmnet_longitudinal_null
Description
Performs a permutational test for the coda_glmnet_longitudinal() algorithm: It provides the distribution of results under the null hypothesis by implementing the coda_glmnet_longitudinal() on different rearrangements of the response variable.
Usage
coda_glmnet_longitudinal_null(
  x,
  y,
  x_time,
  subject_id,
  ini_time,
  end_time,
  niter = 100,
  covar = NULL,
  alpha = 0.9,
  lambda = "lambda.1se",
  nfolds = 10,
  sig = 0.05
)
Arguments
| x | abundance matrix or data frame in long format (several rows per individual) | 
| y | outcome (binary); data type: numeric, character or factor vector | 
| x_time | observation times | 
| subject_id | subject id | 
| ini_time | initial time to be analyzed | 
| end_time | end time to be analyzed | 
| niter | number of sample iterations | 
| covar | data frame with covariates (default = NULL) | 
| alpha | elastic net parameter (default = 0.9) | 
| lambda | penalization parameter (default = "lambda.1se") | 
| nfolds | number of folds | 
| sig | significance value (default = 0.05) | 
Value
list with "accuracy" and "confidence interval"
Author(s)
M. Calle - T. Susin
Examples
set.seed(123) # to reproduce the results
data(ecam_filtered, package = "coda4microbiome")   # load the data
x=x_ecam # microbiome abundance
x_time = metadata$day_of_life    # observation times
subject_id = metadata$studyid   # subject id
y= metadata$diet           # diet ("bd"= breast diet, "fd"=formula diet)
ini_time = 0
end_time = 90
 coda_glmnet_longitudinal_null (x,y, x_time, subject_id, ini_time, end_time,
                                      lambda="lambda.min",nfolds=4, niter=3)
coda_glmnet_null
Description
Performs a permutational test for the coda_glmnet() algorithm: It provides the distribution of results under the null hypothesis by implementing the coda_glmnet() on different rearrangements of the response variable.
Usage
coda_glmnet_null(
  x,
  y,
  niter = 100,
  covar = NULL,
  lambda = "lambda.1se",
  alpha = 0.9,
  sig = 0.05
)
Arguments
| x | abundance matrix or data frame (rows are samples, columns are variables (taxa)) | 
| y | outcome (binary or continuous); data type: numeric, character or factor vector | 
| niter | number of iterations (default = 100) | 
| covar | data frame with covariates (default = NULL) | 
| lambda | penalization parameter (default = "lambda.1se") | 
| alpha | elastic net parameter (default = 0.9) | 
| sig | significance level for the confidence interval (default = 0.05) | 
Value
a list with "accuracy" and "confidence interval"
Author(s)
M. Calle - T. Susin
Examples
data(Crohn, package = "coda4microbiome")
coda_glmnet_null(x=x_Crohn[,(1:10)], y=y_Crohn, niter=2,covar=NULL,lambda="lambda.1se",
                                                alpha=0.9,sig=0.05)
ecam_filtered
Description
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
Format
The dataset contains three objects:
- x_ecam
- microbiome abundance matrix in long format (873 rows) and 37 genera (columns) 
- taxanames
- vector containing the taxonomy of the 37 taxa 
- metadata
- matrix with information on the individuals at the observation time 
References
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
explore_logratios
Description
Explores the association of each log-ratio with the outcome. Summarizes the importance of each variable (taxa) as the aggregation of the association measures of those log-ratios involving the variable. The output includes a plot of the association of the log-ratio with the outcome where the variables (taxa) are ranked by importance
Usage
explore_logratios(
  x,
  y,
  decreasing = TRUE,
  measure = "AUC",
  covar = NULL,
  shownames = FALSE,
  maxrow = 15,
  maxcol = 15,
  showtitle = TRUE,
  mar = c(0, 0, 1, 0)
)
Arguments
| x | abundance matrix or data frame (rows are samples, columns are variables (taxa)) | 
| y | outcome (binary or continuous); data type: numeric, character or factor vector | 
| decreasing | order of importance (default = TRUE) | 
| measure | association measures "AUC", "Pearson","Spearman", "glm" (default = "AUC") | 
| covar | data frame with covariates (default = NULL) | 
| shownames | logical, if TRUE, shows the names of the variables in the rows of the plot (default = FALSE) | 
| maxrow | maximum number of rows to display in the plot (default = 15) | 
| maxcol | maximum number of columns to display in the plot (default = 15) | 
| showtitle | logical, if TRUE, shows the title of the plot (default = TRUE) | 
| mar | mar numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot (default mar=c(0,0,1,0)) | 
Value
list with "max log-ratio","names max log-ratio", "order of importance", "name of most important variables", "association log-ratio with y" and "top log-ratios plot"
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
explore_logratios(x_HIV,y_HIV)
explore_lr_longitudinal
Description
Explores the association of summary (integral) of each log-ratio trajectory with the outcome. Summarizes the importance of each variable (taxa) as the aggregation of the association measures of those log-ratios involving the variable. The output includes a plot of the association of the log-ratio with the outcome where the variables (taxa) are ranked by importance
Usage
explore_lr_longitudinal(
  x,
  y,
  x_time,
  subject_id,
  ini_time,
  end_time,
  showPlots = FALSE,
  decreasing = TRUE,
  covar = NULL,
  shownames = FALSE,
  maxrow = 15,
  maxcol = 15,
  showtitle = TRUE,
  mar = c(0, 0, 1, 0)
)
Arguments
| x | abundance matrix or data frame in long format (several rows per individual) | 
| y | outcome (binary); data type: numeric, character or factor vector | 
| x_time | observation times | 
| subject_id | subject id | 
| ini_time | initial time to be analyzed | 
| end_time | end time to be analyzed | 
| showPlots | if TRUE, shows the plot (default = FALSE) | 
| decreasing | order of importance (default = TRUE) | 
| covar | data frame with covariates (default = NULL) | 
| shownames | if TRUE, shows the names of the variables in the rows of the plot (default = FALSE) | 
| maxrow | maximum number of rows to display in the plot (default = 15) | 
| maxcol | maximum number of columns to display in the plot (default = 15) | 
| showtitle | logical, if TRUE, shows the title of the plot (default = TRUE) | 
| mar | mar numerical vector of the form c(bottom, left, top, right) which gives the number of lines of margin to be specified on the four sides of the plot (default mar=c(0,0,1,0)) | 
Value
list with "max log-ratio","names max log-ratio","order of importance","name of most important variables","association log-ratio with y","top log-ratios plot"
Author(s)
M. Calle - T. Susin
Examples
set.seed(123) # to reproduce the results
data(ecam_filtered, package = "coda4microbiome")   # load the data
x=x_ecam # microbiome abundance
x_time = metadata$day_of_life    # observation times
subject_id = metadata$studyid   # subject id
y= metadata$diet           # diet ("bd"= breast diet, "fd"=formula diet)
ini_time = 0
end_time = 90
ecam_logratios<-explore_lr_longitudinal(x,y,x_time,subject_id,ini_time,end_time)
explore_zeros
Description
Provides the proportion of zeros for a pair of variables (taxa) in table x and the proportion of samples with zero in both variables. A bar plot with this information is also provided. Results can be stratified by a categorical variable.
Usage
explore_zeros(x, id1, id2, strata = NULL)
Arguments
| x | abundance matrix or data frame (rows are samples, columns are variables (taxa)) | 
| id1 | column number in x for the first taxa | 
| id2 | column number in x for the second taxa | 
| strata | stratification variable (default = NULL) | 
Value
a list with the frequency table and the associated bar plot
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
explore_zeros(x_HIV,5,6)
explore_zeros(x_HIV,5,6, strata=y_HIV)
filter_longitudinal
Description
Filters those individuals and taxa with enough longitudinal information
Usage
filter_longitudinal(
  x,
  taxanames = NULL,
  x_time,
  subject_id,
  metadata,
  ini_time = min(x_time),
  end_time = max(x_time),
  percent_indv = 0.7,
  min_obs = 3
)
Arguments
| x | abundance matrix or data frame in long format (several rows per individual) | 
| taxanames | names of different taxa | 
| x_time | observation times | 
| subject_id | subject id | 
| metadata | matrix sample data | 
| ini_time | initial time to be analyzed | 
| end_time | end time to be analyzed | 
| percent_indv | percentage of individuals with more than min_obs observations | 
| min_obs | required minimum number of observations per individual | 
Value
list with filtered abundance table, taxanames and metadata
Author(s)
M. Calle - T. Susin
Examples
data(ecam_filtered, package = "coda4microbiome")   # load the data
x=x_ecam # microbiome abundance
x_time = metadata$day_of_life    # observation times
subject_id = metadata$studyid   # subject id
ini_time = 0
end_time = 360
data_filtered<-filter_longitudinal(x,taxanames,x_time, subject_id, metadata,
                                              ini_time, end_time, min_obs=4)
impute_zeros
Description
Simple imputation: When the abundance table contains zeros, a positive value is added to all the values in the table. It adds 1 when the minimum of table is larger than 1 (i.e. tables of counts) or it adds half of the minimum value of the table, otherwise.
Usage
impute_zeros(x)
Arguments
| x | abundance matrix or data frame (rows are samples, columns are variables (taxa)) | 
Value
x abundance matrix or data frame with zeros substituted by imputed positive values
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
x<-impute_zeros(x_HIV)
integralFun
Description
Integral of the curve trajectory of subject_id in the interval a,b
Usage
integralFun(x, y, id, a, b)
Arguments
| x | abundance matrix or data frame in long format (several rows per individual) | 
| y | outcome (binary); data type: numeric, character or factor vector | 
| id | subjects-ids | 
| a | interval initial time | 
| b | interval final time | 
Value
matrix with integrals for each individual (rows) and each taxa (columns)
Author(s)
M. Calle - T. Susin
logratios_matrix
Description
Computes a large matrix with all the log-ratios between pairs of taxa (columns) in the abundance table
Usage
logratios_matrix(x)
Arguments
| x | abundance matrix or data frame (rows are samples, columns are variables (taxa)) | 
Value
list with matrix of log-ratios, matrix indicating the pairs of variables involved in each log-ratio, and a matrix indicating the names of the variables involved in each log-ratio.
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
lrHIV<-logratios_matrix(x_HIV[,(1:4)])
# matrix of log-ratios (first 6 rows and 6 columns):
lrHIV[[1]][1:6,1:6]
# variables involved in each log-ratio
head(lrHIV[[2]])
# names of the variables involved in each log-ratio
head(lrHIV[[3]])
ecam_filtered
Description
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
Format
The dataset contains three objects:
- x_ecam
- microbiome abundance matrix in long format (873 rows) and 37 genera (columns) 
- taxanames
- vector containing the taxonomy of the 37 taxa 
- metadata
- matrix with information on the individuals at the observation time 
References
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
plotMedianCurve
Description
Internal plot function
Usage
plotMedianCurve(iNum, iDen, X, Y, x_time, subject_id, ini_time, end_time)
Arguments
| iNum | . | 
| iDen | . | 
| X | . | 
| Y | . | 
| x_time | . | 
| subject_id | . | 
| ini_time | . | 
| end_time | . | 
Value
.
Author(s)
M. Calle - T. Susin
plot_prediction
Description
Plot of the predictions of a fitted model: Multiple box-plot and density plots for binary outcomes and Regression plot for continuous outcome
Usage
plot_prediction(prediction, y, strata = NULL, showPlots = TRUE)
Arguments
| prediction | the fitted values of predictions for the model | 
| y | outcome (binary or continuous); data type: numeric, character or factor vector | 
| strata | stratification variable (default = NULL) | 
| showPlots | if TRUE, shows the plots (default = TRUE) | 
Value
prediction plot
Author(s)
M. Calle - T. Susin
Examples
# prediction plot for the log-ratio between columns 3 and 32 on HIV status
data(HIV, package = "coda4microbiome")
x<-impute_zeros(x_HIV)
lr<-log(x[,3])-log(x[,32])
plot_prediction(lr, y_HIV)
plot_riskscore
Description
Plots samples ordered by microbial risk score values along time to event.
Usage
plot_riskscore(risk.score, x, time, status, showPlots = TRUE)
Arguments
| risk.score | microbial risk score values resulting from the coda_coxnet model | 
| x | original survival data | 
| time | time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data. | 
| status | event occurrence. Vector (numeric or logical) specifying 0 (or FALSE) for no event occurrence, and 1 (or TRUE) for event occurrence. | 
| showPlots | (default: TRUE) | 
Value
returns an object of class HeatmapList.
Author(s)
M. Calle, M. Pujolassos, T. Susin
Examples
set.seed(12345)
data(data_survival, package = "coda4microbiome")
time <- Event_time
status <- Event
crohn_cox <- coda_coxnet(x = x,
                         time = Event_time,
                         status = Event,
                         covar = NULL,
                         lambda = "lambda.1se", nvar = NULL,
                         alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0)
plot_riskscore(risk.score = crohn_cox$risk.score,
                    x = x,
                    time = Event_time,
                    status =  Event,
                    showPlots = TRUE)
#-------------------------------------------------------------------------------
plot_signature
Description
Graphical representation of the variables selected and their coefficients
Usage
plot_signature(vars, coeff, showPlots = TRUE, varnames = NULL)
Arguments
| vars | variables selected | 
| coeff | associated coefficients | 
| showPlots | if TRUE, shows the plots (default = TRUE) | 
| varnames | if TRUE, shows the names of the variables | 
Value
bar plot
Author(s)
M. Calle - T. Susin
Examples
plot_signature(c(2,10, 3, 15, 4), c(0.8, -0.1, 0.2, -0.6, -0.3))
plot_signature_curves
Description
Graphical representation of the signature trajectories
Usage
plot_signature_curves(
  varNum,
  coeff,
  x,
  y,
  x_time,
  subject_id,
  ini_time,
  end_time,
  color = c("chocolate1", "slateblue2"),
  showLabel = TRUE,
  location = "bottomright",
  inset = c(0.01, 0.02),
  cex = 0.8,
  y.intersp = 0.7,
  main_title = NULL
)
Arguments
| varNum | column number of the variables (taxa) | 
| coeff | coefficients (coefficients must sum-up zero) | 
| x | microbiome abundance matrix in long format | 
| y | binary outcome; data type: numeric, character or factor vector | 
| x_time | observation times | 
| subject_id | subject id | 
| ini_time | initial time to be analyzed | 
| end_time | end time to be analyzed | 
| color | color graphical parameter | 
| showLabel | graphical parameter (see help(label)) | 
| location | graphical parameter (see help(label)) | 
| inset | graphical parameter (see help(label)) | 
| cex | graphical parameter (see help(label)) | 
| y.intersp | graphical parameter (see help(label)) | 
| main_title | title plot | 
Value
trajectories plot
Author(s)
M. Calle - T. Susin
Examples
x=matrix(c(2, 3, 4, 1, 2, 5, 10, 20, 15, 30, 40, 12), ncol=2)
x_time = c(0,10,20,1,15, 25)
subject_id = c(1,1,1,2,2,2)
y=c(0,0,0,1,1,1)
plot_signature_curves(varNum=c(1,2), coeff=c(1,-1), x, y,x_time, subject_id,
                       ini_time=0, end_time=25)
plot_survcurves
Description
Plots survival curves stratifying samples based on their signature value. Signature value for stratification can be set by the user.
Usage
plot_survcurves(risk.score, time, status, strata.quantile = 0.5)
Arguments
| risk.score | microbial risk score values resulting from the coda_coxnet model | 
| time | time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data (what about interval data?). | 
| status | event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence. | 
| strata.quantile | cut-off quantile (values 0, 1) for sample stratification based on signature value. Default is set to 0.5 quantile of the signature. | 
Value
return an object of class ggsurvplot which is list containing the following: plot: the survival plot (ggplot object). table: the number of subjects at risk table per time (ggplot object). data.survplot: data used to plot the survival curves (data.frame). data.survtable: data used to plot the tables under the main survival curves (data.frame).
Author(s)
M. Calle, M. Pujolassos, T. Susin
Examples
set.seed(12345)
data(data_survival, package = "coda4microbiome")
time <- Event_time
status <- Event
crohn_cox <- coda_coxnet(x = x,
                         time = Event_time,
                         status = Event,
                         covar = NULL,
                         lambda = "lambda.1se", nvar = NULL,
                         alpha = 0.9, nfolds = 10, showPlots = TRUE, coef_threshold = 0)
plot_survcurves(risk.score = crohn_cox$risk.score,
                 time,
                 status,
                 strata.quantile = 0.5)
#-------------------------------------------------------------------------------
sCD14
Description
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
Format
The dataset contains two objects:
- x_sCD14
- microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns) 
- y_sCD14
- a - numericvariable with the value of the inflammation parameter sCD14 for each sample
References
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)
shannon
Description
Shannon information
Usage
shannon(x)
Arguments
| x | abundance composition (vector) | 
Value
shannon information
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
shannon(x_HIV[1,])
shannon_effnum
Description
Shannon effective number of variables in a composition
Usage
shannon_effnum(x)
Arguments
| x | abundance composition (vector) | 
Value
shannon information
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
shannon_effnum(x_HIV[1,])
shannon_sim
Description
Shannon similarity between two compositions
Usage
shannon_sim(x, y)
Arguments
| x | abundance composition (vector) | 
| y | abundance composition (vector) | 
Value
shannon similarity (value between 0 and 1)
Author(s)
M. Calle - T. Susin
Examples
data(HIV, package = "coda4microbiome")
shannon_sim(x_HIV[1,],x_HIV[2,])
ecam_filtered
Description
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
Format
The dataset contains three objects:
- x_ecam
- microbiome abundance matrix in long format (873 rows) and 37 genera (columns) 
- taxanames
- vector containing the taxonomy of the 37 taxa 
- metadata
- matrix with information on the individuals at the observation time 
References
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
data_survival
Description
Survival Data simulated from the Crohn's disease original study: 48 taxa and 150 individuals
Format
The dataset contains three objects:
- x
- microbiome abundance matrix for 150 individuals (rows) and 48 genera (columns) 
- Event
- a - numeric, event occurrence. Vector (type: numeric or logical) specifying 0 or FALSE for no event occurrence, and 1 or TRUE for event occurrence.
- Event_time
- a - numeric, time to event or follow up time for right censored data. Must be a vector (type:numeric) specifying time to event for each sample for right censored data.
References
doi:10.1016/j.chom.2014.02.005
Crohn
Description
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
Format
The dataset contains two objects:
- x_Crohn
- microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns) 
- y_Crohn
- a - factor, indicating if the sample corresponds to a case (CD) or a control (no).
References
doi:10.1016/j.chom.2014.02.005
HIV
Description
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
Format
The dataset contains three objects:
- x_HIV
- microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns) 
- y_HIV
- a factor, specifying if the individual is HIV positive or ( - Pos) or negative (- Neg).
- MSM_HIV
- a factor, indicating sexual preferences: - MSM(Men who have Sex with Men) or not (- nonMSM).
References
doi:10.1016/j.ebiom.2016.01.032
ecam_filtered
Description
Microbiome composition at genus level from Early childhood and the microbiome (ECAM) study (Bokulich et al. 2016). Metadata and microbiome data were downloaded from https://github.com/caporaso-lab/longitudinal-notebooks. Filtered data contains information on 42 children and 37 taxa.
Format
The dataset contains three objects:
- x_ecam
- microbiome abundance matrix in long format (873 rows) and 37 genera (columns) 
- taxanames
- vector containing the taxonomy of the 37 taxa 
- metadata
- matrix with information on the individuals at the observation time 
References
Bokulich et al. (2016). Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci Transl Med 8:343
sCD14
Description
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
Format
The dataset contains two objects:
- x_sCD14
- microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns) 
- y_sCD14
- a - numericvariable with the value of the inflammation parameter sCD14 for each sample
References
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)
Crohn
Description
Microbiome composition at genus level from a Crohn's disease study: 48 taxa and 975 individuals (662 patients with Crohn's disease and 313 controls)
Format
The dataset contains two objects:
- x_Crohn
- microbiome abundance matrix for 975 individuals (rows) and 48 genera (columns) 
- y_Crohn
- a - factor, indicating if the sample corresponds to a case (CD) or a control (no).
References
doi:10.1016/j.chom.2014.02.005
HIV
Description
Microbiome abundances (60 taxa and 155 individuals) from an HIV study (Noguera-Julian et al. 2016).
Format
The dataset contains three objects:
- x_HIV
- microbiome abundance matrix for 155 individuals (rows) and 60 genera (columns) 
- y_HIV
- a factor, specifying if the individual is HIV positive or ( - Pos) or negative (- Neg).
- MSM_HIV
- a factor, indicating sexual preferences: - MSM(Men who have Sex with Men) or not (- nonMSM).
References
doi:10.1016/j.ebiom.2016.01.032
sCD14
Description
Microbiome composition (60 taxa and 151 individuals) and inflammatory parameter sCD14 from an HIV study (Noguera-Julian et al. 2016). A dataset containing the number of counts of 60 different genera in a group of 151 samples (including HIV - infected and non - infected patients).
Format
The dataset contains two objects:
- x_sCD14
- microbiome abundance matrix for 151 individuals (rows) and 60 genera (columns) 
- y_sCD14
- a - numericvariable with the value of the inflammation parameter sCD14 for each sample
References
Rivera-Pinto et al. (2018) Balances: a new perspective for microbiome analysis. mSystems 3 (4)