Type: | Package |
Title: | Identification and Analysis of Co-Occurrence Networks |
Version: | 0.3.1 |
Maintainer: | Federico Marotta <federico.marotta@embl.de> |
Description: | Implementation of the NetCutter algorithm described in Müller and Mancuso (2008) <doi:10.1371/journal.pone.0003178>. The package identifies co-occurring terms in a list of containers. For example, it may be used to detect genes that co-occur across genomes. |
URL: | https://doi.org/10.1371/journal.pone.0003178 |
BugReports: | https://github.com/fmarotta/netcutter/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
Imports: | PoissonBinomial, rlecuyer, |
Suggests: | knitr, rmarkdown, qpdf, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-05-19 19:51:51 UTC; fmarotta |
Author: | Heiko Müller [aut], Francesco Mancuso [aut], Federico Marotta [cre] |
Repository: | CRAN |
Date/Publication: | 2025-05-21 15:50:06 UTC |
netcutter: Identification and Analysis of Co-Occurrence Networks
Description
Implementation of the NetCutter algorithm described in Müller and Mancuso (2008) doi:10.1371/journal.pone.0003178. The package identifies co-occurring terms in a list of containers. For example, it may be used to detect genes that co-occur across genomes.
Author(s)
Maintainer: Federico Marotta federico.marotta@embl.de
Authors:
Heiko Müller
Francesco Mancuso
See Also
Useful links:
Define co-occurrence modules
Description
Helper function to generate the list of co-occurrence terms grouped into modules of a specified size.
Usage
nc_define_modules(occ_matrix, terms_of_interest, module_size, min_occurrences)
Arguments
occ_matrix |
The original occurrence matrix. |
terms_of_interest |
Vector of column names or indices representing the terms that should be included in the analysis. |
module_size |
The number of terms that should be tested for co-occurrence. |
min_occurrences |
Minimum number of occurrences of each term. |
Value
A list of the valid modules.
Compute co-occurrence probabilities
Description
The main NetCutter function. It generates p-values for all the co-occurring modules.
Usage
nc_eval(
occ_matrix,
occ_probs,
terms_of_interest = NULL,
module_size = 2,
min_occurrences = 0,
min_support = 0,
mc.cores = 1
)
Arguments
occ_matrix |
The original occurrence matrix. |
occ_probs |
The matrix of occurrence probabilities, as computed by
|
terms_of_interest |
Vector of column names or indices representing the terms that should be included in the analysis. |
module_size |
The number of terms that should be tested for co-occurrence. |
min_occurrences |
Minimum number of occurrences of each term. |
min_support |
Minimum number of occurrences of each module. |
mc.cores |
Number of parallel computations with mclapply() (set to 1 for serial execution) |
Details
If terms_of_interest
is NULL
, all the terms in occ_matrix
are used. If
it is not null, only modules containing at least one of these terms will be
considered. min_occurrences
and min_support
are still used to further
restrict the list of terms that are considered.
Value
A data.frame
with one row for each valid module, and corresponding
number of co-occurrences and p-value.
Examples
# Generate an occurrence matrix.
m <- matrix(FALSE, 3, 9, dimnames = list(paste0("ID", 1:3), paste0("gene", 1:9)))
m[1, 1:3] <- m[2, c(1:2, 4:5)] <- m[3, c(1, 6:9)] <- TRUE
# Set the seed using the "L'Ecuyer-CMRG" random number generator.
set.seed(1, "L'Ecuyer-CMRG")
# Compute the occurrence probabilities.
occ_probs <- nc_occ_probs(m, R = 20, S = 50)
# Evaluate the co-occurrences of pairs of terms and their statistical significance.
nc_eval(m, occ_probs, module_size = 2)
# Now evaluate triples; no need to recompute the occurrence probabilities.
nc_eval(m, occ_probs, module_size = 3)
# Now consider only modules involving gene1 or gene2.
nc_eval(m, occ_probs, module_size = 2, terms_of_interest = c("gene1", "gene2"))
Compute the occurrence probabilities
Description
Use the EdgeSwapping method to find the probability of occurrence of each term in each container under the null hypothesis.
Usage
nc_occ_probs(
occ_matrix,
R = 500,
S = sum(occ_matrix) * 10,
mc.cores = getOption("mc.cores", 1L),
n_batches = ceiling(R/30),
verbose = FALSE
)
Arguments
occ_matrix |
The original co-occurrence matrix |
R |
The number of randomisations to perform |
S |
The number of successful edge swaps for each randomisation |
mc.cores |
Number of parallel computations with mclapply() (set to 1 for serial execution) |
n_batches |
Split the computation into |
verbose |
Print a status message when starting every new batch. |
Value
The occurrence probability matrix.
Examples
# Generate an occurrence matrix.
m <- matrix(FALSE, 3, 9, dimnames = list(paste0("ID", 1:3), paste0("gene", 1:9)))
m[1, 1:3] <- m[2, c(1:2, 4:5)] <- m[3, c(1, 6:9)] <- TRUE
# Set the seed using the `rlecuyer` package
rlecuyer::.lec.SetPackageSeed(1:6)
# Compute the occurrence probabilities.
occ_probs <- nc_occ_probs(m, R = 20, S = 50)
# Using `n_batches=1` can speed up the computations at the cost of more RAM.
occ_probs <- nc_occ_probs(m, R = 20, n_batches = 1, mc.cores = 1)
Compute the occurrence probabilities
Description
This is a simpler implementation used to check that the official
implementation (nc_occ_probs()
) works well.
Usage
nc_occ_probs_simple(occ_matrix, R, S)
Arguments
occ_matrix |
The original co-occurrence matrix |
R |
The number of randomisations to perform |
S |
The number of successful edge swaps for each randomisation |
Randomize the occurrence matrix
Description
Apply an edge-swapping algorithm.
Usage
nc_randomize(occ_matrix, S)
Arguments
occ_matrix |
The original occurrence matrix. |
S |
The number of successful edge swaps to perform. |
Value
A randomized copy of the occurrence matrix.
Randomize the occurrence matrix
Description
Old implementation in pure R, kept for testing purposes and for reproducibility of old results.
Usage
nc_randomize_R(occ_matrix, S)
Arguments
occ_matrix |
The original occurrence matrix. |
S |
The number of successful edge swaps to perform. |
Randomize the occurrence matrix
Description
Faster implementation that samples row and column independently
Usage
nc_randomize_fast(occ_matrix, S)
Arguments
occ_matrix |
The original occurrence matrix. |
S |
The number of successful edge swaps to perform. |
Randomize the occurrence matrix
Description
This is a simpler implementation used to check that the official
implementation (nc_randomize()
) works well.
Usage
nc_randomize_simple(occ_matrix, S)
Arguments
occ_matrix |
The original occurrence matrix. |
S |
The number of successful edge swaps to perform. |
Sample one item from a vector, even when the vector has length 1
Description
Sample one item from a vector, even when the vector has length 1
Usage
safe_sample(x)
Arguments
x |
Vector of values to sample |
Details
When x
has length 1, the sample() function thinks that we want to
sample from 1 to x
. However, we deal want to sample vectors of unknown
length, and possibly of length 1, but we always want to sample among
the values of x
. This function ensures that.
Value
One value from x
.