| Title: | Site Occupancy Modeling for Environmental DNA Metabarcoding | 
| Version: | 1.2.1 | 
| Description: | Fits community site occupancy models to environmental DNA metabarcoding data collected using spatially-replicated survey design. Model fitting results can be used to evaluate and compare the effectiveness of species detection to find an efficient survey design. Reference: Fukaya et al. (2022) <doi:10.1111/2041-210X.13732>, Fukaya and Hasebe (2025) <doi:10.1002/1438-390X.12219>. | 
| License: | GPL (≥ 3) | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| RoxygenNote: | 7.3.2 | 
| Suggests: | covr, rmarkdown, spelling, testthat (≥ 3.0.0), tibble, vdiffr | 
| Config/testthat/edition: | 3 | 
| Imports: | stats, methods, graphics, checkmate, crayon, knitr | 
| Collate: | 'classes.R' 'fish.R' 'fish_raw.R' 'get_posterior.R' 'gof.R' 'occumbData.R' 'occumb.R' 'plot.R' 'predict.R' 'show.R' 'summary.R' 'utility.R' 'utils.R' | 
| URL: | https://fukayak.github.io/occumb/, https://github.com/fukayak/occumb | 
| Depends: | jagsUI, R (≥ 2.10) | 
| VignetteBuilder: | knitr | 
| SystemRequirements: | JAGS (http://mcmc-jags.sourceforge.net) | 
| BugReports: | https://github.com/fukayak/occumb/issues | 
| Language: | en-US | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-16 06:50:05 UTC; kfukaya | 
| Author: | Keiichi Fukaya [aut, cre], Ken Kellner [cph] (summary method for occumbFit class), Mika Takahashi [aut] | 
| Maintainer: | Keiichi Fukaya <fukayak99@gmail.com> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-16 09:30:09 UTC | 
Expected utility for local species diversity assessments.
Description
eval_util_L() evaluates the expected utility of a local
species diversity assessment by using Monte Carlo integration.
Usage
eval_util_L(
  settings,
  fit = NULL,
  z = NULL,
  theta = NULL,
  phi = NULL,
  N_rep = 1,
  cores = 1L
)
Arguments
| settings | A data frame that specifies a set of conditions under which
utility is evaluated. It must include columns named  | 
| fit | An  | 
| z | Sample values of site occupancy status of species stored in an array
with sample  | 
| theta | Sample values of sequence capture probabilities of species
stored in a matrix with sample  | 
| phi | Sample values of sequence relative dominance of species stored in
a matrix with sample  | 
| N_rep | Controls the sample size for the Monte Carlo integration.
The integral is evaluated using  | 
| cores | The number of cores to use for parallelization. | 
Details
The utility of local species diversity assessment for a given set of sites
can be defined as the expected number of detected species per site
(Fukaya et al. 2022). eval_util_L() evaluates this utility for arbitrary
sets of sites that can potentially have different values for site occupancy
status of species, z, sequence capture probabilities of species,
\theta, and sequence relative dominance of species,
\phi, for the combination of K and N values specified in the
conditions argument.
Such evaluations can be used to balance K and N to maximize the utility
under a constant budget (possible combinations of K and N under a
specified budget and cost values are easily obtained using list_cond_L();
see the example below).
It is also possible to examine how the utility varies with different K
and N values without setting a budget level, which may be useful for determining
a satisfactory level of K and N from a purely technical point of view.
The expected utility is defined as the expected value of the conditional
utility in the form:
U(K, N \mid \boldsymbol{r}, \boldsymbol{u}) = \frac{1}{J}\sum_{j = 1}^{J}\sum_{i = 1}^{I}\left\{1 - \prod_{k = 1}^{K}\left(1 - \frac{u_{ijk}r_{ijk}}{\sum_{m = 1}^{I}u_{mjk}r_{mjk}} \right)^N \right\}
where u_{ijk} is a latent indicator variable representing
the inclusion of the sequence of species i in replicate k
at site j, and r_{ijk} is a latent variable that
is proportional to the relative frequency of the sequence of species
i, conditional on its presence in replicate k at site
j (Fukaya et al. 2022).
Expectations are taken with respect to the posterior (or possibly prior)
predictive distributions of \boldsymbol{r} = \{r_{ijk}\} and
\boldsymbol{u} = \{u_{ijk}\}, which are evaluated numerically using
Monte Carlo integration. The predictive distributions of
\boldsymbol{r} and \boldsymbol{u} depend on the model
parameters z, \theta, and \phi values.
Their posterior (or prior) distribution is specified by supplying an
occumbFit object containing their posterior samples via the fit argument,
or by supplying a matrix or array of posterior (or prior) samples of
parameter values via the z, theta, and phi arguments. Higher
approximation accuracy can be obtained by increasing the value of N_rep.
The eval_util_L() function can be executed by supplying the fit argument
without specifying the z, theta, and phi arguments, by supplying the
three z, theta, and phi arguments without the fit argument, or by
supplying the fit argument and any or all of the z, theta, and phi
arguments. If z, theta, or phi arguments are specified in addition
to the fit, the parameter values given in these arguments are used
preferentially to evaluate the expected utility. If the sample sizes differ among
parameters, parameters with smaller sample sizes are resampled with
replacements to align the sample sizes across parameters.
The expected utility is evaluated assuming homogeneity of replicates, in the
sense that \theta and \phi, the model parameters
associated with the species detection process, are constant across
replicates within a site. For this reason, eval_util_L() does not accept
replicate-specific \theta and \phi. If the
occumbFit object supplied in the fit argument has a replicate-specific
parameter, the parameter samples to be used in the utility evaluation must be
provided explicitly via the theta or phi arguments.
The Monte Carlo integration is executed in parallel on multiple CPU cores, where
the cores argument controls the degree of parallelization.
Value
A data frame with a column named Utility in which the estimates of the
expected utility are stored. This is obtained by adding the Utility column
to the data frame provided in the settings argument.
References
K. Fukaya, N. I. Kondo, S. S. Matsuzaki and T. Kadoya (2022) Multispecies site occupancy modelling and study design for spatially replicated environmental DNA metabarcoding. Methods in Ecology and Evolution 13:183–193. doi:10.1111/2041-210X.13732
Examples
set.seed(1)
# Generate a random dataset (20 species * 2 sites * 2 reps)
I <- 20 # Number of species
J <- 2  # Number of sites
K <- 2  # Number of replicates
data <- occumbData(
    y = array(sample.int(I * J * K), dim = c(I, J, K)))
# Fitting a null model
fit <- occumb(data = data)
## Estimate expected utility
# Arbitrary K and N values
(util1 <- eval_util_L(expand.grid(K = 1:3, N = c(1E3, 1E4, 1E5)),
                      fit))
# K and N values under specified budget and cost
(util2 <- eval_util_L(list_cond_L(budget = 1E5,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  fit),
                      fit))
# K values restricted
(util3 <- eval_util_L(list_cond_L(budget = 1E5,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  fit,
                                  K = 1:5),
                      fit))
# theta and phi values supplied
(util4 <- eval_util_L(list_cond_L(budget = 1E5,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  fit,
                                  K = 1:5),
                      fit,
                      theta = array(0.5, dim = c(4000, I, J)),
                      phi = array(1, dim = c(4000, I, J))))
# z, theta, and phi values, but no fit object supplied
(util5 <- eval_util_L(list_cond_L(budget = 1E5,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  fit,
                                  K = 1:5),
                      fit = NULL,
                      z = array(1, dim = c(4000, I, J)),
                      theta = array(0.5, dim = c(4000, I, J)),
                      phi = array(1, dim = c(4000, I, J))))
Expected utility for regional species diversity assessments.
Description
eval_util_R() evaluates the expected utility of a regional
species diversity assessment using Monte Carlo integration.
Usage
eval_util_R(
  settings,
  fit = NULL,
  psi = NULL,
  theta = NULL,
  phi = NULL,
  N_rep = 1,
  cores = 1L
)
Arguments
| settings | A data frame that specifies a set of conditions under which
utility is evaluated. It must include columns named  | 
| fit | An  | 
| psi | Sample values of the site occupancy probabilities of species
stored in a matrix with sample  | 
| theta | Sample values of sequence capture probabilities of species
stored in a matrix with sample  | 
| phi | Sample values of sequence relative dominance of species stored in
a matrix with sample  | 
| N_rep | Controls the sample size for the Monte Carlo integration.
The integral is evaluated using a total of  | 
| cores | The number of cores to use for parallelization. | 
Details
The utility of a regional species diversity assessment can be defined as
the number of species expected to be detected in the region of interest
(Fukaya et al. 2022). eval_util_R() evaluates this utility for the region
modeled in the occumbFit object for the combination of J, K, and N
values specified in the conditions argument.
Such evaluations can be used to balance J, K, and N to maximize the
utility under a constant budget (possible combinations of J, K, and N
under a specified budget and cost values are easily obtained using
list_cond_R(); see the example below).
It is also possible to examine how the utility varies with different J,
K, and N values without setting a budget level, which may be useful in determining
the satisfactory levels of J, K, and N from a purely technical point of
view.
The expected utility is defined as the expected value of the conditional utility in the form:
U(J, K, N \mid \boldsymbol{r}, \boldsymbol{u}) = \sum_{i = 1}^{I}\left\{1 - \prod_{j = 1}^{J}\prod_{k = 1}^{K}\left(1 - \frac{u_{ijk}r_{ijk}}{\sum_{m = 1}^{I}u_{mjk}r_{mjk}} \right)^N \right\}
where u_{ijk} is a latent indicator variable representing
the inclusion of the sequence of species i in replicate k
at site j, and r_{ijk} is a latent variable that
is proportional to the relative frequency of the sequence of species
i, conditional on its presence in replicate k at site
j (Fukaya et al. 2022).
Expectations are taken with respect to the posterior (or possibly prior)
predictive distributions of \boldsymbol{r} = \{r_{ijk}\} and
\boldsymbol{u} = \{u_{ijk}\}, which are evaluated numerically using
Monte Carlo integration. The predictive distributions of
\boldsymbol{r} and \boldsymbol{u} depend on the model
parameters \psi, \theta, and \phi values.
Their posterior (or prior) distribution is specified by supplying an
occumbFit object containing their posterior samples via the fit argument,
or by supplying a matrix or array of posterior (or prior) samples of
parameter values via the psi, theta, and phi arguments. Higher
approximation accuracy can be obtained by increasing the value of N_rep.
The eval_util_R() function can be executed by supplying the fit argument
without specifying the psi, theta, and phi arguments, by supplying the
three psi, theta, and phi arguments without the fit argument, or by
supplying the fit argument and any or all of the psi, theta, and phi
arguments. If the psi, theta, or phi arguments are specified in addition
to the fit, the parameter values given in these arguments are preferentially
used to evaluate the expected utility. If the sample sizes differed among
parameters, parameters with smaller sample sizes are resampled with
replacements to align the sample sizes across parameters.
The expected utility is evaluated assuming homogeneity of replicates, in the
sense that \theta and \phi, the model parameters
associated with the species detection process, are constant across
replicates within a site. For this reason, eval_util_R() does not accept
replicate-specific \theta and \phi. If the
occumbFit object supplied in the fit argument has a replicate-specific
parameter, the parameter samples to be used in the utility evaluation must be
provided explicitly via the theta or phi arguments.
If the parameters are modeled as a function of site covariates in the fit
object, or if the psi, theta, and/or phi arguments have site dimensions,
the expected utility is evaluated to account for the site heterogeneity of
the parameters. To incorporate site heterogeneity, the
parameter values for each J site are determined by selecting site-specific
parameter values in the fit, or those supplied in psi, theta, and phi
via random sampling with replacement. Thus, expected utility is
evaluated by assuming a set of supplied parameter values as a statistical
population of site-specific parameters.
The Monte Carlo integration is executed in parallel on multiple CPU cores, where
the cores argument controls the degree of parallelization.
Value
A data frame with a column named Utility in which the estimates of
the expected utility are stored. This is obtained by adding the Utility column
to the data frame provided in the settings argument.
References
K. Fukaya, N. I. Kondo, S. S. Matsuzaki and T. Kadoya (2022) Multispecies site occupancy modelling and study design for spatially replicated environmental DNA metabarcoding. Methods in Ecology and Evolution 13:183–193. doi:10.1111/2041-210X.13732
Examples
set.seed(1)
# Generate a random dataset (20 species * 2 sites * 2 reps)
I <- 20 # Number of species
J <- 2  # Number of sites
K <- 2  # Number of replicates
data <- occumbData(
    y = array(sample.int(I * J * K), dim = c(I, J, K)))
# Fitting a null model
fit <- occumb(data = data)
## Estimate expected utility
# Arbitrary J, K, and N values
(util1 <- eval_util_R(expand.grid(J = 1:3, K = 1:3, N = c(1E3, 1E4, 1E5)),
                      fit))
# J, K, and N values under specified budget and cost
(util2 <- eval_util_R(list_cond_R(budget = 50000,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  lambda3 = 5000),
                      fit))
# K values restricted
(util3 <- eval_util_R(list_cond_R(budget = 50000,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  lambda3 = 5000,
                                  K = 1:5),
                      fit))
# J and K values restricted
(util4 <- eval_util_R(list_cond_R(budget = 50000,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  lambda3 = 5000,
                                  J = 1:3, K = 1:5),
                      fit))
# theta and phi values supplied
(util5 <- eval_util_R(list_cond_R(budget = 50000,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  lambda3 = 5000,
                                  J = 1:3, K = 1:5),
                      fit,
                      theta = array(0.5, dim = c(4000, I, J)),
                      phi = array(1, dim = c(4000, I, J))))
# psi, theta, and phi values, but no fit object supplied
(util6 <- eval_util_R(list_cond_R(budget = 50000,
                                  lambda1 = 0.01,
                                  lambda2 = 5000,
                                  lambda3 = 5000,
                                  J = 1:3, K = 1:5),
                      fit = NULL,
                      psi = array(0.9, dim = c(4000, I, J)),
                      theta = array(0.9, dim = c(4000, I, J)),
                      phi = array(1, dim = c(4000, I, J))))
Fish eDNA metabarcoding dataset
Description
A dataset of fish eDNA metabarcoding collected in the Kasumigaura watershed, Japan.
Usage
fish
Format
An occumbData class object containing the sequence read count y,
a species covariate mismatch, and a site covariate riverbank.
mismatch represents the total number of mismatched bases in the
priming region of the forward and reverse primers for each species.
riverbank indicates whether the riverbank at each site lacks aquatic
and riparian vegetation.
Sequence reads were obtained from three replicates (collected from the
center of the river and near the left and right riverbanks)
from 50 sites across the watershed, of which read counts from six samples were
missing.
The resulting sequence counts of 50 freshwater fish taxa were recorded.
Source
K. Fukaya, N. I. Kondo, S. S. Matsuzaki, T. Kadoya (2021) Data from: Multispecies site occupancy modeling and study design for spatially replicated environmental DNA metabarcoding. Dryad Digital Repository. doi:10.5061/dryad.3bk3j9kkm
Fish eDNA metabarcoding dataset
Description
A dataset of fish eDNA metabarcoding collected in the Kasumigaura watershed, Japan.
Usage
fish_raw
Format
A list containing an array of sequence read count, y,
a vector of the total number of mismatched bases in the priming region of
the forward and reverse primers for each species, mismatch,
and a factor indicating whether the riverbank at each site lacked aquatic
and riparian vegetation, riverbank.
Sequence reads were obtained from three replicates (collected from the
center of the river and near the left and right riverbanks)
from 50 sites across the watershed, of which read counts from six samples were
missing.
The resulting sequence counts of 50 freshwater fish taxa detected were
recorded.
Source
K. Fukaya, N. I. Kondo, S. S. Matsuzaki, T. Kadoya (2021) Data from: Multispecies site occupancy modeling and study design for spatially replicated environmental DNA metabarcoding. Dryad Digital Repository. doi:10.5061/dryad.3bk3j9kkm
Extract posterior samples or summary of parameters from a model-fit object.
Description
get_post_samples() extracts posterior samples of the specified
parameters from a model-fit object.
get_post_summary() extracts posterior summary of the specified
parameters from a model-fit object.
Usage
get_post_samples(
  fit,
  parameter = c("z", "pi", "phi", "theta", "psi", "alpha", "beta", "gamma",
    "alpha_shared", "beta_shared", "gamma_shared", "Mu", "sigma", "rho"),
  output_dataframe = FALSE
)
get_post_summary(
  fit,
  parameter = c("z", "pi", "phi", "theta", "psi", "alpha", "beta", "gamma",
    "alpha_shared", "beta_shared", "gamma_shared", "Mu", "sigma", "rho"),
  output_dataframe = FALSE
)
Arguments
| fit | An  | 
| parameter | A string of parameter name. See Details for possible choices and corresponding parameters. | 
| output_dataframe | If  | 
Details
The functions return posterior samples or a summary of one of the
following parameters in the model, stored in the model-fit object
fit:
- z
- Site occupancy status of species. 
- pi
- Multinomial probabilities of species sequence read counts. 
- phi
- Sequence relative dominance of species. 
- theta
- Sequence capture probabilities of species. 
- psi
- Site occupancy probabilities of species. 
- alpha
- Species-specific effects on sequence relative dominance ( - phi).
- beta
- Species-specific effects on sequence capture probabilities ( - theta).
- gamma
- Species-specific effects on site occupancy probabilities ( - psi).
- alpha_shared
- Effects on sequence relative dominance ( - phi) common across species.
- beta_shared
- Effects on sequence capture probabilities ( - theta) that are common across species.
- gamma_shared
- Effects on site occupancy probabilities ( - psi) that are common across species.
- Mu
- Community-level averages of species-specific effects ( - alpha,- beta,- gamma).
- sigma
- Standard deviations of species-specific effects ( - alpha,- beta,- gamma).
- rho
- Correlation coefficients of the species-specific effects ( - alpha,- beta,- gamma).
See the package vignette for details of these parameters.
The parameter may have dimensions corresponding to species, sites,
replicates, and effects (covariates), and when
output_dataframe = FALSE, the dimension and label
attributes are added to the output object to inform these dimensions.
If the sequence read count data y have species, site, or replicate
names appended as the dimnames attribute (see Details in
occumbData()), they are copied into the label
attribute of the returned object.
Value
By default, get_post_samples() returns a vector, matrix, or array of
posterior samples for a selected parameter.
get_post_summary() returns, by default, a table (matrix) of the
posterior summary of the selected parameters. The elements of the posterior
summary are the same as those obtained with the jags()
function in the jagsUI package: they include the mean, standard
deviation, percentiles of posterior samples; the Rhat statistic;
the effective sample size, n.eff; overlap0, which checks if 0
falls in the parameter's 95% credible interval; and the proportion of the
posterior with the same sign as the mean, f.
The dimension and label attributes of the output object
provide information regarding the dimensions of the parameter.
When output_dataframe = TRUE, the results are returned in data
frame format where the attributes obtained when
output_dataframe = FALSE are incorporated into the table.
Examples
# Generate the smallest random dataset (2 species * 2 sites * 2 reps)
I <- 2 # Number of species
J <- 2 # Number of sites
K <- 2 # Number of replicates
y_named <- array(sample.int(I * J * K), dim = c(I, J, K))
dimnames(y_named) <- list(c("species 1", "species 2"),
                          c("site 1", "site 2"), NULL)
data_named <- occumbData(y = y_named)
# Fitting a null model
fit <- occumb(data = data_named, n.iter = 10100)
# Extract posterior samples
(post_sample_z <- get_post_samples(fit, "z"))
# Look dimensions of the parameter
attributes(post_sample_z)
# Extract posterior summary
(post_summary_z <- get_post_summary(fit, "z"))
# Look dimensions of the parameter
attributes(post_summary_z)
Goodness-of-fit assessment of the fitted model.
Description
gof() calculates omnibus discrepancy measures and
their Bayesian p-values for the fitted model using the posterior
predictive check approach.
Usage
gof(
  fit,
  stats = c("Freeman_Tukey", "deviance", "chi_squared"),
  cores = 1L,
  plot = TRUE,
  ...
)
Arguments
| fit | An  | 
| stats | The discrepancy statistics to be applied. | 
| cores | The number of cores to use for parallelization. | 
| plot | Logical, determine if draw scatter plots of the fit statistics. | 
| ... | Additional arguments passed to the default  | 
Details
A discrepancy statistic for the fitted model is obtained using a posterior predictive checking procedure. The following statistics are currently available:
- Freeman-Tukey statistics (default)
- T_{\textrm{FT}} = \sum_{i,j,k}\left(\sqrt{y_{ijk}} - \sqrt{E(y_{ijk} \mid \pi_{ijk})}\right)^2
- Deviance statistics
- T_{\textrm{deviance}} = -2 \sum_{j,k} \log \textrm{Multinomial}(\boldsymbol{y}_{jk} \mid \boldsymbol{\pi}_{jk})
- Chi-squared statistics
- T_{\chi^2} = \sum_{i,j,k}\frac{\left(y_{ijk} - E(y_{ijk} \mid \pi_{ijk})\right)^2}{E(y_{ijk} \mid \pi_{ijk})}
where i, j, and k are the subscripts of species, site, and
replicate, respectively,
y_{ijk} is sequence read count data,
\pi_{ijk} is multinomial cell probabilities of sequence read
counts,
E(y_{ijk} \mid \pi_{ijk}) is expected
value of the sequence read counts conditional on their cell probabilities,
and \log \textrm{Multinomial}(\boldsymbol{y}_{jk} \mid \boldsymbol{\pi}_{jk})
is the multinomial log-likelihood of the sequence read counts in replicate
k of site j conditional on their cell probabilities.
The Bayesian p-value is estimated as the probability that the value of
the discrepancy statistics of the replicated dataset is more extreme than that
of the observed dataset.
An extreme Bayesian p-value may indicate inadequate model fit.
See Gelman et al. (2014), Kéry and Royle (2016), and
Conn et al. (2018) for further details on the procedures used for posterior
predictive checking.
Computations can be run in parallel on multiple CPU cores where the cores
argument controls the degree of parallelization.
Value
A list with the following named elements:
- stats
- The discrepancy statistics applied. 
- p_value
- Bayesian - p-value.
- stats_obs
- Discrepancy statistics for observed data. 
- stats_rep
- Discrepancy statistics for repeated data. 
References
P. B. Conn, D. S. Johnson, P. J. Williams, S. R. Melin and M. B. Hooten. (2018) A guide to Bayesian model checking for ecologists. Ecological Monographs 88:526–542. doi:10.1002/ecm.1314
A. Gelman, J. B. Carlin, H. S. Stern D. B. Dunson, A. Vehtari and D. B. Rubin (2013) Bayesian Data Analysis. 3rd edition. Chapman and Hall/CRC. https://www.stat.columbia.edu/~gelman/book/
M. Kéry and J. A. Royle (2016) Applied Hierarchical Modeling in Ecology — Analysis of Distribution, Abundance and Species Richness in R and BUGS. Volume 1: Prelude and Static Models. Academic Press. https://www.mbr-pwrc.usgs.gov/pubanalysis/keryroylebook/
Examples
# Generate the smallest random dataset (2 species * 2 sites * 2 reps)
I <- 2 # Number of species
J <- 2 # Number of sites
K <- 2 # Number of replicates
data <- occumbData(
    y = array(sample.int(I * J * K), dim = c(I, J, K)),
    spec_cov = list(cov1 = rnorm(I)),
    site_cov = list(cov2 = rnorm(J),
                    cov3 = factor(1:J)),
    repl_cov = list(cov4 = matrix(rnorm(J * K), J, K)))
# Fitting a null model
fit <- occumb(data = data)
# Goodness-of-fit assessment
gof_result <- gof(fit)
gof_result
Conditions for local assessment under certain budget and cost values.
Description
list_cond_L() constructs a list of possible local species
diversity assessment conditions under the specified budget and cost values.
Usage
list_cond_L(budget, lambda1, lambda2, fit, K = NULL)
Arguments
| budget | A numeric specifying budget amount. The currency unit
is arbitrary but must be consistent with that of  | 
| lambda1 | A numeric specifying the cost per sequence read for
high-throughput sequencing. The currency unit is arbitrary but must be
consistent with that of  | 
| lambda2 | A numeric specifying the cost per replicate for library
preparation. The currency unit is arbitrary but must be consistent with that
of  | 
| fit | An  | 
| K | An optional vector for manually specifying the number of replicates. | 
Details
This function can generate a data frame object to be given to the
settings argument of eval_util_L(); see Examples of eval_util_L().
By default, it outputs a list of all feasible combinations of values for the
number of replicates per site K and the sequencing depth per replicate
N based on the given budget, cost values, and number of sites
(identified by reference to the fit object). The resulting N can be a
non-integer because it is calculated simply by assuming that the maximum
value can be obtained. To obtain a list for only a subset of the
possible K values under a given budget and cost value, the K
argument is used to provide a vector of the desired K values.
Value
A data frame containing columns named budget, lambda1, lambda2,
K, and N.
Conditions for regional assessment under certain budget and cost values.
Description
list_cond_R() constructs a list of possible regional species
diversity assessment conditions under the specified budget and cost values.
Usage
list_cond_R(budget, lambda1, lambda2, lambda3, J = NULL, K = NULL)
Arguments
| budget | A numeric specifying budget amount. The currency unit is
arbitrary but must be consistent with that of  | 
| lambda1 | A numeric specifying the cost per sequence read for
high-throughput sequencing. The currency unit is arbitrary but must be
consistent with that of  | 
| lambda2 | A numeric specifying the cost per replicate for library
preparation. The currency unit is arbitrary but must be consistent with that
of  | 
| lambda3 | A numeric specifying the visiting cost per site. The currency
unit is arbitrary but must be consistent with that of  | 
| J | An optional vector for manually specifying the number of sites | 
| K | An optional vector used to specify the number of replicates manually.
For computational convenience, the  | 
Details
This function can generate a data frame object to be given to the
settings argument of eval_util_R(); see Examples of eval_util_R().
By default, it outputs a list of all feasible combinations of values for the
number of sites J, number of replicates per site K, and sequencing
depth per replicate N based on the given budget and cost values.
The resulting N can be a non-integer because it is calculated simply by
assuming that the maximum value can be obtained.
If one wants to obtain a list for only a subset of the possible values of J
and K under a given budget and cost value, use the J and/or K
arguments (in fact, it is recommended that a relatively small number of K
values be specified using the K argument because the list of all
conditions achievable under moderate budget and cost values can be large, and
it is rarely practical to have a vast number of replicates per site). If a
given combination of J and K values is not feasible under the specified
budget and cost values, the combination will be ignored and excluded from
the output.
Value
A data frame containing columns named budget, lambda1, lambda2,
lambda3, J, K, and N.
Model-fitting function.
Description
occumb() fits the community site-occupancy model for eDNA
metabarcoding (Fukaya et al. 2022) and returns a model-fit object containing
posterior samples.
Usage
occumb(
  formula_phi = ~1,
  formula_theta = ~1,
  formula_psi = ~1,
  formula_phi_shared = ~1,
  formula_theta_shared = ~1,
  formula_psi_shared = ~1,
  prior_prec = 1e-04,
  prior_ulim = 10000,
  data,
  n.chains = 4,
  n.adapt = NULL,
  n.burnin = 10000,
  n.thin = 10,
  n.iter = 20000,
  parallel = FALSE,
  ...
)
Arguments
| formula_phi | A right-hand side formula describing species-specific
effects of sequence relative dominance ( | 
| formula_theta | A right-hand side formula describing species-specific
effects of sequence capture probability ( | 
| formula_psi | A right-hand side formula describing species-specific
effects of occupancy probability ( | 
| formula_phi_shared | A right-hand side formula describing effects of
sequence relative dominance ( | 
| formula_theta_shared | A right-hand side formula describing effects of
sequence capture probability ( | 
| formula_psi_shared | A right-hand side formula describing effects of
occupancy probability ( | 
| prior_prec | Precision of normal prior distribution for the community-level average of species-specific parameters and effects common across species. | 
| prior_ulim | Upper limit of uniform prior distribution for the standard deviation of species-specific parameters. | 
| data | A dataset supplied as an  | 
| n.chains | Number of Markov chains to run. | 
| n.adapt | Number of iterations to run in the JAGS adaptive phase. | 
| n.burnin | Number of iterations at the beginning of the chain to discard. | 
| n.thin | Thinning rate. Must be a positive integer. | 
| n.iter | Total number of iterations per chain (including burn-in). | 
| parallel | If TRUE, run MCMC chains in parallel on multiple CPU cores. | 
| ... | Additional arguments passed to  | 
Details
occumb() allows the fitting of a range of community site
occupancy models, including covariates at different levels of the data
generation process.
The most general form of the model can be written as follows (the notation
follows that of the original article; see Fukaya et al. (2022) or
the package vignette).
Sequence read counts:
(y_{1jk}, ..., y_{Ijk}) \sim \textrm{Multinomial}((\pi_{1jk}, ...,  \pi_{Ijk}), N_{jk}),
\pi_{ijk} = \frac{u_{ijk}r_{ijk}}{\sum_m u_{mjk}r_{mjk}},
Relative frequency of species sequences:
r_{ijk} \sim \textrm{Gamma}(\phi_{ijk}, 1),
Capture of species sequences:
u_{ijk} \sim \textrm{Bernoulli}(z_{ij}\theta_{ijk}),
Site occupancy of species:
z_{ij} \sim \textrm{Bernoulli}(\psi_{ij}),
where the variations of \phi, \theta, and
\psi are modeled by specifying model formulas in
formula_phi, formula_theta, formula_psi,
formula_phi_shared, formula_theta_shared, and
formula_psi_shared.
Each parameter may have species-specific effects and effects that are common
across species, where the former is specified by formula_phi,
formula_theta, and formula_psi, whereas
formula_phi_shared, formula_theta_shared, and
formula_psi_shared specify the latter.
As species-specific intercepts are specified by default, the intercept
terms in formula_phi_shared, formula_theta_shared, and
formula_psi_shared are always ignored.
Covariate terms must be found in the names of the list elements stored
in the spec_cov, site_cov, or repl_cov slots in the dataset
object provided with the data argument.
Covariates are modeled using the log link function for \phi
and logit link function for \theta and \psi.
The two arguments, prior_prec and prior_ulim, control the
prior distribution of parameters. For the community-level average of
species-specific effects and effects common across species, a normal
prior distribution with a mean of 0 and precision (i.e., the inverse of the
variance) prior_prec is specified. For the standard deviation of
species-specific effects, a uniform prior distribution with a lower limit of
zero and an upper limit of prior_ulim is specified. For the correlation
coefficient of species-specific effects, a uniform prior distribution in the
range of -1 to 1 is specified by default.
See the package vignette
for details on the model specifications in occumb().
The data argument requires a dataset object to be generated using
ocumbData(); see the document of occumbData().
The model is fit using the jags() function of the
jagsUI package, where
Markov chain Monte Carlo (MCMC) methods are used to
obtain posterior samples of the parameters and latent variables.
Arguments n.chains, n.adapt, n.burnin, n.thin,
n.iter, and parallel are passed on to arguments of the
same name in the jags() function.
See the document of jagsUI's
jags() function for details.
A set of random initial values is used to perform an MCMC run.
Value
An S4 object of the occumbFit class containing the results of
the model fitting and the supplied dataset.
References
K. Fukaya, N. I. Kondo, S. S. Matsuzaki and T. Kadoya (2022) Multispecies site occupancy modelling and study design for spatially replicated environmental DNA metabarcoding. Methods in Ecology and Evolution 13:183–193. doi:10.1111/2041-210X.13732
Examples
# Generate the smallest random dataset (2 species * 2 sites * 2 reps)
I <- 2 # Number of species
J <- 2 # Number of sites
K <- 2 # Number of replicates
data <- occumbData(
    y = array(sample.int(I * J * K), dim = c(I, J, K)),
    spec_cov = list(cov1 = rnorm(I)),
    site_cov = list(cov2 = rnorm(J),
                    cov3 = factor(1:J)),
    repl_cov = list(cov4 = matrix(rnorm(J * K), J, K)))
# Fitting a null model (includes only species-specific intercepts)
res0 <- occumb(data = data)
# Add species-specific effects of site covariates in occupancy probabilities
res1 <- occumb(formula_psi = ~ cov2, data = data)        # Continuous covariate
res2 <- occumb(formula_psi = ~ cov3, data = data)        # Categorical covariate
res3 <- occumb(formula_psi = ~ cov2 * cov3, data = data) # Interaction
# Add species covariate in the three parameters
# Note that species covariates are modeled as common effects
res4 <- occumb(formula_phi_shared = ~ cov1, data = data)   # phi
res5 <- occumb(formula_theta_shared = ~ cov1, data = data) # theta
res6 <- occumb(formula_psi_shared = ~ cov1, data = data)   # psi
# Add replicate covariates
# Note that replicate covariates can only be specified for theta and phi
res7 <- occumb(formula_phi = ~ cov4, data = data)   # phi
res8 <- occumb(formula_theta = ~ cov4, data = data) # theta
# Specify the prior distribution and MCMC settings explicitly
res9 <- occumb(data = data, prior_prec = 1E-2, prior_ulim = 1E2,
               n.chains = 1, n.burnin = 1000, n.thin = 1, n.iter = 2000)
res10 <- occumb(data = data, parallel = TRUE, n.cores = 2) # Run MCMC in parallel
Constructor for occumbData data class.
Description
occumbData() creates a data list compatible with the model fitting
function occumb().
The element (i.e., covariate) names for spec_cov, site_cov, and
repl_cov must all be unique.
If y has a dimnames attribute, it is retained in the resulting
occumbData object, and can be referenced in subsequent analyses.
Usage
occumbData(y, spec_cov = NULL, site_cov = NULL, repl_cov = NULL)
Arguments
| y | A 3-D array or a dataframe of sequence read counts
( | 
| spec_cov | A named list of species covariates.
Each covariate can be a vector of continuous ( | 
| site_cov | A named list of site covariates.
Each covariate can be a vector of continuous ( | 
| repl_cov | A named list of replicate covariates.
Each covariate can be a matrix of continuous ( | 
Value
An S4 object of the occumbData class.
Examples
# Generate the smallest random dataset (2 species * 2 sites * 2 reps)
I <- 2 # Number of species
J <- 2 # Number of sites
K <- 2 # Number of replicates
data <- occumbData(
    y = array(sample.int(I * J * K), dim = c(I, J, K)),
    spec_cov = list(cov1 = rnorm(I)),
    site_cov = list(cov2 = rnorm(J), cov3 = factor(1:J)),
    repl_cov = list(cov4 = matrix(rnorm(J * K), J, K))
)
# A case for named y (with species and site names)
y_named <- array(sample.int(I * J * K), dim = c(I, J, K))
dimnames(y_named) <- list(c("common species", "uncommon species"),
                          c("good site", "bad site"), NULL)
data_named <- occumbData(
    y = y_named,
    spec_cov = list(cov1 = rnorm(I)),
    site_cov = list(cov2 = rnorm(J), cov3 = factor(1:J)),
    repl_cov = list(cov4 = matrix(rnorm(J * K), J, K))
)
# A real data example
data(fish_raw)
fish <- occumbData(
    y = fish_raw$y,
    spec_cov = list(mismatch = fish_raw$mismatch),
    site_cov = list(riverbank = fish_raw$riverbank)
)
# Get an overview of the datasets
summary(data)
summary(data_named)
summary(fish)
Plot method for occumbFit class.
Description
Applies jagsUI's
plot method to an occumbFit object to draw trace plots
and density plots of MCMC samples of model parameters.
Usage
## S4 method for signature 'occumbFit'
plot(x, y = NULL, ...)
Arguments
| x | An  | 
| y | 
 | 
| ... | Additional arguments passed to the plot method for jagsUI object. | 
Value
Returns NULL invisibly.
Plot method for occumbGof class.
Description
Draws a scatter plot of fit statistics.
Usage
## S4 method for signature 'occumbGof'
plot(x, y = NULL, ...)
Arguments
| x | An  | 
| y | 
 | 
| ... | Additional arguments passed to the default  | 
Value
Returns NULL invisibly.
Predict method for occumbFit class.
Description
Obtain predictions of parameters related to species occupancy
and detection from an occumbFit model object.
Usage
## S4 method for signature 'occumbFit'
predict(
  object,
  newdata = NULL,
  parameter = c("phi", "theta", "psi"),
  scale = c("response", "link"),
  type = c("quantiles", "mean", "samples"),
  output_dataframe = FALSE
)
Arguments
| object | An  | 
| newdata | An optional  | 
| parameter | The parameter to be predicted. | 
| scale | The scale on which the prediction is made.
 | 
| type | The type of prediction.
 | 
| output_dataframe | If  | 
Details
Applying predict() to an occumbFit object generates predictions
for the specified parameter (phi, theta, or psi) based
on the estimated effects and the given covariates. It is important to
recognize that the predictions are specific to the individual species being
modeled since they depend on the estimated species-specific effects (i.e.,
alpha, beta, and gamma; see
the package vignette for details).
When providing newdata, it must thus be assumed that the set of
species contained in newdata is the same as that of the data being
fitted.
Value
Predictions are obtained as a matrix or array that can have dimensions
corresponding to statistics (or samples), species, sites, and replicates.
The dimension and label attributes are added to the output
object to inform these dimensions.
If the sequence read count data y has species, site, or replicate
names appended as the dimnames attribute (see Details in
occumbData()), they will be copied into the label
attribute of the returned object.
When output_dataframe = TRUE, the results are returned in data
frame format where the attributes obtained when
output_dataframe = FALSE are incorporated into the table.
Summary method for occumbData class.
Description
Summarizes dataset stored in an occumbData object.
Usage
## S4 method for signature 'occumbData'
summary(object)
Arguments
| object | An  | 
Value
Returns NULL invisibly.
Summary method for occumbFit class.
Description
Summarizes model fitting result stored in an occumbFit object.
Usage
## S4 method for signature 'occumbFit'
summary(object)
Arguments
| object | An  | 
Value
Returns NULL invisibly.