Type: Package
Title: Biclustering via Latent Block Model Adapted to Overdispersed Count Data
Version: 0.1.2
Description: Implementation of a probabilistic method for biclustering adapted to overdispersed count data. It is a Gamma-Poisson Latent Block Model. It also implements two selection criteria in order to select the number of biclusters.
License: GPL-3
URL: https://github.com/julieaubert/cobiclust
BugReports: https://github.com/julieaubert/cobiclust/issues
Depends: R (≥ 3.5.0)
Imports: assertthat, cluster, stats, testthat
Suggests: spelling
Encoding: UTF-8
Language: en-US
RoxygenNote: 7.2.3
NeedsCompilation: no
Packaged: 2024-02-16 09:49:46 UTC; jaubert
Author: Julie Aubert ORCID iD [aut, cre], INRAE [cph]
Maintainer: Julie Aubert <julie.aubert@inrae.fr>
Repository: CRAN
Date/Publication: 2024-02-16 12:10:02 UTC

Calculate the matrix of interaction terms between groups of species and groups of sample

Description

Calculate the matrix of interaction terms between groups of species and groups of sample

Usage

alpha_calculation(
  s_ik = s_ik,
  t_jg = t_jg,
  nu_j = nu_j,
  mu_i = mu_i,
  K = K,
  G = G,
  x = x,
  exp_utilde = exp_utilde
)

Arguments

s_ik

s_ik.

t_jg

t_jg.

nu_j

nu_j.

mu_i

mu_i.

K

K.

G

G.

x

a matrix of observations. Columns correspond to biological samples and rows to microorganisms.

exp_utilde

exp_utilde.

Value

a matrix of dimension (K,G) of the terms of interactions.


Perform a biclustering adapted to overdispersed count data.

Description

Perform a biclustering adapted to overdispersed count data.

Usage

cobiclust(
  x,
  K = 2,
  G = 3,
  nu_j = NULL,
  a = NULL,
  akg = FALSE,
  cvg_lim = 1e-05,
  nbiter = 5000,
  tol = 1e-04
)

Arguments

x

the input matrix of observed data.

K

an integer specifying the number of groups in rows.

G

an integer specifying the number of groups in columns.

nu_j

a vector of numeric, corresponding of a column (sampling effort) effect.

a

a numeric dispersion parameter (parameter of the gamma distribution).

akg

a logical variable indicating whether to use a common dispersion parameter (akg = FALSE) or not.

cvg_lim

a number specifying the threshold used for convergence criterion.

nbiter

the maximal number of iterations for the global loop of variational EM algorithm (nbiter = 5000 by default).

tol

the level of relative iteration convergence tolerance (tol = 1e-04 by default).

Value

An object of class cobiclustering

See Also

cobiclustering for the cobiclustering class.

Examples

npc <- c(50, 40) # nodes per class
KG <- c(2, 3) # classes
nm <- npc * KG # nodes
Z <- diag(KG[1]) %x% matrix(1, npc[1], 1)
W <- diag(KG[2]) %x% matrix(1, npc[2], 1)
L <- 70*matrix(runif(KG[1] * KG[2]), KG[1], KG[2])
M_in_expectation <- Z %*% L %*% t(W)
size <- 50
M <- matrix(
  rnbinom(
    n = length(as.vector(M_in_expectation)),
    mu = as.vector(M_in_expectation), size = size
  ),
  nm[1], nm[2]
)
rownames(M) <- paste('OTU', 1:nrow(M), sep = '_')
colnames(M) <- paste('S', 1:ncol(M), sep = '_')
res <- cobiclust(M, K = 2, G = 3, nu_j = rep(1, 120), a = 1 / size, cvg_lim = 1e-5)

Creation of the cobiclustering class.

Description

Creation of the cobiclustering class.

Usage

cobiclustering(
  data = matrix(nrow = 3, ncol = 3, NA),
  K = 2,
  G = 2,
  classification = list(length = 2),
  strategy = list(),
  parameters = list(),
  info = list()
)

Useful function to estimate the parameter a

Description

Useful function to estimate the parameter a

Usage

foo_a(x, nb, left_bound, right_bound)

Arguments

x

x.

nb

nb.

left_bound

left_bound.

right_bound

right_bound.

Value

a numeric.


Initialisation of the co-clusters by partitioning around medoids method.

Description

Initialisation of the co-clusters by partitioning around medoids method.

Usage

init_pam(x, nu_j = NULL, a = NULL, K = K, G = G, akg = FALSE)

Arguments

x

The output of the cobiclust function.

nu_j

a vector of numeric, corresponding of a column effect, may be interpreted as a sampling effort. The length is equal to the number of columns.

a

an numeric.

K

an integer specifying the number of groups in rows.

G

an integer specifying the number of groups in columns.

akg

a logical variable indicating whether to use a common dispersion parameter (akg = FALSE) or a dispersion parameter per cocluster (akg = TRUE).

Value

A list of

nu_j

nu_j.

mu_i

mu_i.

t_jg

t_jg.

s_ik

s_ik.

pi_c

pi.

rho_c

rho.

a

a.

exp_utilde

exp_utilde.

exp_logutilde

exp_logutilde.

alpha_c

alpha.


Is an object of class cobiclustering ?

Description

Is an object of class cobiclustering ?

Usage

is.cobiclustering(object)

Arguments

object

an object of class cobiclustering.


Calculate the lower bound

Description

Calculate the lower bound

Usage

lb_calculation(
  x = x,
  qu_param = qu_param,
  s_ik = s_ik,
  pi_c = pi_c,
  t_jg = t_jg,
  rho_c = rho_c,
  mu_i = mu_i,
  nu_j = nu_j,
  alpha_c = alpha_c,
  a = a,
  akg = TRUE
)

Arguments

x

a matrix of observations. Columns correspond to biological samples and rows to microorganisms.

qu_param

qu_param.

s_ik

s_ik.

pi_c

pi_c.

t_jg

t_jg.

rho_c

rho_c.

mu_i

mu_i.

nu_j

nu_j.

alpha_c

a matrix the terms of interactions.

a

a.

akg

a logical variable indicating whether to use a common dispersion parameter (akg = FALSE) or a dispersion parameter per cocluster (akg = TRUE).

Value

a list of 2 elements.

lb

value of the lower bound.

ent

value of the entropy term.


Calculate the BIC penalty

Description

Calculate the BIC penalty

Usage

penalty(x)

Arguments

x

an object of class biclustering.

Value

the value of the BIC penalty.


Calculate approximate conditional moment of the third hidden layer U

Description

Calculate approximate conditional moment of the third hidden layer U

Usage

qu_calculation(
  s_ik = s_ik,
  t_jg = t_jg,
  x = x,
  mu_i = mu_i,
  nu_j = nu_j,
  alpha_c = alpha_c,
  a = a
)

Arguments

s_ik

s_ik.

t_jg

t_jg.

x

a matrix of observations. Columns correspond to biological samples and rows to microorganisms.

mu_i

mu_i.

nu_j

a vector of numeric, corresponding of a column (sampling effort) effect.

alpha_c

alpha_c.

a

a numeric dispersion parameter (parameter of the gamma distribution).

Value

A list of 4 elements.

a_tilde

a_tilde.

b_tilde

b_tilde.

exp_utilde

exp_utilde.

exp_logutilde

exp_logutilde.


Calculate the approximate conditional moments of the third hidden variable U and its log

Description

Calculate the approximate conditional moments of the third hidden variable U and its log

Usage

qukg_calculation(
  s_ik = s_ik,
  t_jg = t_jg,
  x = x,
  mu_i = mu_i,
  nu_j = nu_j,
  alpha_c = alpha_c,
  a = a
)

Arguments

s_ik

s_ik.

t_jg

t_jg.

x

a matrix of observations. Columns correspond to biological samples and rows to microorganisms.

mu_i

mu_i.

nu_j

nu_j.

alpha_c

alpha_c.

a

a0.

Value

A list of 4 elements.

a_tilde

a_tilde.

b_tilde

b_tilde.

exp_utilde

exp_utilde.

exp_logutilde

exp_logutilde.


Calculate selection criteria.

Description

Calculate selection criteria.

Usage

selection_criteria(x, K = NULL, G = NULL)

Arguments

x

The output of the cobiclust function.

K

The number of groups in rows.

G

The number of groups in columns.

Value

A dataframe with 7 columns.

vICL

the vICL selection criterion.

BIC

the BIC selection criterion.

penKG

the value of the BIC penalty.

lb

the value of the lower bound of the log-likelihood.

entZW

the value of the entropy of the latent variables Z and W.

K

the number of groups in rows.

G

the number of groups in columns.


Summary of an object of class Cobiclust

Description

Summary of an object of class Cobiclust

Usage

## S3 method for class 'cobiclustering'
summary(object, ...)

Arguments

object

an object of class cobiclustering.

...

ignored