Type: | Package |
Title: | Mutational Signature Analysis Tools |
Version: | 1.0.7 |
Description: | Utility functions for mutational signature analysis as described in Alexandrov, L. B. (2020) <doi:10.1038/s41586-020-1943-3>. This package provides two groups of functions. One is for dealing with mutational signature "exposures" (i.e. the counts of mutations in a sample that are due to each mutational signature). The other group of functions is for matching or comparing sets of mutational signatures. 'mSigTools' stands for mutational Signature analysis Tools. |
License: | GPL-3 |
URL: | https://github.com/Rozen-Lab/mSigTools |
BugReports: | https://github.com/Rozen-Lab/mSigTools/issues |
Encoding: | UTF-8 |
Language: | en-US |
RoxygenNote: | 7.2.3 |
Imports: | clue, philentropy, quadprog, sets |
Suggests: | cosmicsig, ICAMS, spelling, testthat (≥ 3.0.0), |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2023-01-13 09:40:32 UTC; e0012078 |
Author: | Steven Rozen |
Maintainer: | Steven Rozen <steverozen@pm.me> |
Repository: | CRAN |
Date/Publication: | 2023-01-13 13:30:05 UTC |
Find best matches (by cosine similarity) of a set of mutational signatures to a set of reference mutational signatures.
Description
Find best matches (by cosine similarity) of a set of mutational signatures to a set of reference mutational signatures.
Usage
TP_FP_FN_avg_sim(extracted.sigs, reference.sigs, similarity.cutoff = 0.9)
Arguments
extracted.sigs |
Mutational signatures discovered by some analysis. A numerical-matrix-like object with columns as signatures. |
reference.sigs |
A numerical-matrix-like object with columns as signatures. This matrix should contain the reference mutational signatures. For example, these might be from a synthetic data set or they could be from reference set of signatures, such as the signatures at the COSMIC mutational signatures web site. See CRAN package cosmicsig. |
similarity.cutoff |
A signature in |
Details
Match signatures in extracted.sigs
to
signatures in reference.sigs
using match_two_sig_sets
based on cosine similarity.
Value
A list with the elements
-
TP
The number of true positive extracted signatures. -
FP
The number of false positive extracted signatures. -
FN
The number of false negative reference signatures. -
avg.cos.sim
The average cosine similarity of true positives to their matching reference signatures. -
table
A data.frame of extracted signatures that matched a reference signature. Each row contains the extracted signature name, the reference signature name, and the cosine similarity of the match. -
sim.matrix
The numeric distance or similarity matrix betweenextracted.sigs
andreference.sigs
as returned fromsig_dist_matrix
. -
unmatched.ex.sigs
The identifiers of the extracted signatures that did not match a reference signature. -
unmatched.ref.sigs
The identifiers of the reference signatures that did not match an extracted signature.
Examples
ex.sigs <- matrix(c(0.2, 0.8, 0.3, 0.7, 0.6, 0.4), nrow = 2)
colnames(ex.sigs) <- c("ex1", "ex2", "ex3")
ref.sigs <- matrix(c(0.21, 0.79, 0.19, 0.81), nrow = 2)
colnames(ref.sigs) <- c("ref1", "ref2")
TP_FP_FN_avg_sim(
extracted.sigs = ex.sigs,
reference.sigs = ref.sigs,
similarity.cutoff = .9
)
Find "best" reconstruction of a target signature or spectrum from a set of signatures.
Description
Find "best" reconstruction of a target signature or spectrum from a set of signatures.
Usage
find_best_reconstruction_QP(
target.sig,
sig.universe,
max.subset.size = NULL,
method = "cosine",
trim.less.than = 1e-10
)
Arguments
target.sig |
The signature or spectrum to reconstruct; a non-negative numeric vector or 1-column matrix-like object. |
sig.universe |
The universe of signatures from which to reconstruct
|
max.subset.size |
Maximum number of signatures to use to
reconstruct |
method |
As in |
trim.less.than |
After optimizing exposures with
|
Details
This function should be fast if you do not specify max.subset.size
,
but it will be combinatorially slow if max.subset.size
is large
and trim.less.than
is small or negative. So do not do that.
If max.subset.size
is NULL
, then the function just uses optimize_exposure_QP
.
and then excludes exposures < trim.less.than
, and then re-runs
optimize_exposure_QP
. Otherwise, after excluding
exposures < trim.less.than
, then the function runs optimize_exposure_QP
on
subsets of signatures of size <= max.subset.size
, removes exposures < trim.less.than
,
reruns optimize_exposure_QP
, calculates the reconstruction and
similarity between the reconstruction and the target.sig
and returns the information for
the exposures that have the greatest similarity.
Value
A list with elements:
-
optimized.exposure
A numerical vector of the exposures that give the "best" reconstruction. This vector is empty if there is an error. -
similarity
The similarity between thereconstruction
(see below) andtarget.sig
according to the distance or similarity provided by themethod
argument. -
method
The value specified for themethod
argument, or an error message ifoptimize.exposure
is empty. -
reconstruction
The reconstruction oftarget.sig
according tooptimized.exposure
.
Examples
set.seed(888)
sig.u <-
do.call(
cbind,
lapply(1:6, function(x) {
col <- runif(n = 96)
col / sum(col)
})
)
rr <- find_best_reconstruction_QP(
target.sig = sig.u[, 1, drop = FALSE],
sig.universe = sig.u[, 2:6]
)
names(rr)
rr$optimized.exposure
rr$similarity
rr <- find_best_reconstruction_QP(
target.sig = sig.u[, 1, drop = FALSE],
sig.universe = sig.u[, 2:6],
max.subset.size = 3
)
rr$optimized.exposure
rr$similarity
Find an optimal matching between two sets of signatures subject to a maximum distance.
Description
Find an optimal matching between two sets of signatures subject to a maximum distance.
Usage
match_two_sig_sets(
x1,
x2,
method = "cosine",
convert.sim.to.dist = function(x) {
return(1 - x)
},
cutoff = 0.9
)
Arguments
x1 |
A numerical-matrix-like object with columns as signatures. |
x2 |
A numerical-matrix-like object with columns as signatures.
Needs to have the same number of rows as |
method |
As for the |
convert.sim.to.dist |
If |
cutoff |
A maximum distance or minimum similarity over which to
pair signatures between |
Details
Match signatures between x1
and x2
using the function
solve_LSAP
, which uses the
"Hungarian" (a.k.a "Kuhn–Munkres") algorithm
https://en.wikipedia.org/wiki/Hungarian_algorithm,
which optimizes the total cost associated with the links
between nodes.
This function generates a distance matrix between the two
sets of signatures using method
and, if necessary,
convert.sim.to.dist
.
It then sets distances > cutoff
to very large values and
then applies solve_LSAP
to the resulting
matrix to compute a matching between
x1
and x2
that minimizes the sum of the
distances.
Value
A list with the elements
-
table
Table of extracted signatures that matched a reference signature. Each row contains the extracted signature name, the reference signature name, and the distance of the match. -
orig.matrix
The matrix of numeric distances betweenx1
andx2
. -
modified.matrix
The argumentorig.matrix
with distances >cutoff
changed to very large values.
Examples
ex.sigs <- matrix(c(0.2, 0.8, 0.3, 0.7, 0.6, 0.4), nrow = 2)
colnames(ex.sigs) <- c("ex1", "ex2", "ex3")
ref.sigs <- matrix(c(0.21, 0.79, 0.19, 0.81), nrow = 2)
colnames(ref.sigs) <- c("ref1", "ref2")
match_two_sig_sets(ex.sigs, ref.sigs, cutoff = .9)
Quadratic programming optimization of signature activities
Description
Quadratic programming optimization of signature activities
Usage
optimize_exposure_QP(spectrum, signatures)
Arguments
spectrum |
Mutational signature or mutational spectrum as a numeric vector or single column data frame or matrix. |
signatures |
Matrix or data frame of signatures from which to
reconstruct |
Details
Code adapted from SignatureEstimation::decomposeQP
and
uses solve.QP
in package quadprog
.
Value
A vector of exposures with names being the colnames
from
signatures
.
Examples
usigs <- matrix(c(0.2, 0.7, 0.1,
0.3, 0.6, 0.1,
0.1, 0.1, 0.8), nrow = 3)
colnames(usigs) <- c("u1", "u2", "u3")
tsig <- matrix(c(0.25, 0.65, 0.1), nrow = 3)
optimize_exposure_QP(tsig, usigs)
Plot exposures in multiple plots, with each plot showing exposures for a manageable number of samples.
Description
Plot exposures in multiple plots, with each plot showing exposures for a manageable number of samples.
Usage
plot_exposure(
exposure,
samples.per.line = 30,
plot.proportion = FALSE,
xlim = NULL,
ylim = NULL,
legend.x = NULL,
legend.y = NULL,
cex.legend = 0.9,
cex.yaxis = 1,
cex.xaxis = NULL,
plot.sample.names = TRUE,
yaxis.labels = NULL,
...
)
Arguments
exposure |
Exposures as a numerical |
samples.per.line |
Number of samples to show in each plot. |
plot.proportion |
Plot exposure proportions rather than counts. |
xlim , ylim |
Limits for the x and y axis. If |
legend.x , legend.y |
The x and y co-ordinates to be used to position the legend. |
cex.legend |
A numerical value giving the amount by which legend plotting text and symbols should be magnified relative to the default. |
cex.yaxis |
A numerical value giving the amount by which y axis values should be magnified relative to the default. |
cex.xaxis |
A numerical value giving the amount by which x axis values
should be magnified relative to the default. If
|
plot.sample.names |
Whether to plot sample names below the x axis.
Default is TRUE. Ignored if there are no column names on
|
yaxis.labels |
User defined y axis labels to be plotted. If
|
... |
Other arguments passed to |
Value
An invisible list. The first element is a logical value indicating whether the plot is successful. The second element is a numeric vector giving the coordinates of the bar x-axis midpoints drawn, useful for adding to the graph.
Examples
file <- system.file("extdata",
"Liver-HCC.exposure.csv",
package = "mSigTools"
)
exposure <- read_exposure(file)
old.par <- par(mar = c(8, 5, 1, 1))
plot_exposure(exposure[, 1:30],
main = "Liver-HCC exposure", cex.yaxis = 0.8,
plot.proportion = TRUE
)
par(old.par)
Plot a matrix of exposures in a single plot.
Description
Plot a matrix of exposures in a single plot.
Usage
plot_exposure_internal(
exposure,
plot.proportion = FALSE,
xlim = NULL,
ylim = NULL,
legend.x = NULL,
legend.y = NULL,
cex.legend = 0.9,
cex.yaxis = 1,
cex.xaxis = NULL,
plot.sample.names = TRUE,
yaxis.labels = NULL,
...
)
Arguments
exposure |
Exposures as a numerical |
plot.proportion |
Plot exposure proportions rather than counts. |
xlim , ylim |
Limits for the x and y axis. If |
legend.x , legend.y |
The x and y co-ordinates to be used to position the legend. |
cex.legend |
A numerical value giving the amount by which legend plotting text and symbols should be magnified relative to the default. |
cex.yaxis |
A numerical value giving the amount by which y axis values should be magnified relative to the default. |
cex.xaxis |
A numerical value giving the amount by which x axis values
should be magnified relative to the default. If
|
plot.sample.names |
Whether to plot sample names below the x axis.
Default is TRUE. Ignored if there are no column names on
|
yaxis.labels |
User defined y axis labels to be plotted. If
|
... |
Other arguments passed to |
Value
An invisible list. The first element is a logical value indicating whether the plot was successful. The second element is a numeric vector giving the coordinates of the bar x-axis midpoints drawn, useful for adding to the graph.
Plot exposures in multiple plots to a single PDF file, with each plot showing exposures for a manageable number of samples.
Description
Plot exposures in multiple plots to a single PDF file, with each plot showing exposures for a manageable number of samples.
Usage
plot_exposure_to_pdf(
exposure,
file,
mfrow = c(2, 1),
mar = c(6, 4, 3, 2),
oma = c(3, 2, 0, 2),
samples.per.line = 30,
plot.proportion = FALSE,
xlim = NULL,
ylim = NULL,
legend.x = NULL,
legend.y = NULL,
cex.legend = 0.9,
cex.yaxis = 1,
cex.xaxis = NULL,
plot.sample.names = TRUE,
yaxis.labels = NULL,
width = 8.2677,
height = 11.6929,
...
)
Arguments
exposure |
Exposures as a numerical |
file |
The name of the PDF file to be produced. |
mfrow |
A vector of the form |
mar |
A numerical vector of the form |
oma |
A vector of the form |
samples.per.line |
Number of samples to show in each plot. |
plot.proportion |
Plot exposure proportions rather than counts. |
xlim , ylim |
Limits for the x and y axis. If |
legend.x , legend.y |
The x and y co-ordinates to be used to position the legend. |
cex.legend |
A numerical value giving the amount by which legend plotting text and symbols should be magnified relative to the default. |
cex.yaxis |
A numerical value giving the amount by which y axis values should be magnified relative to the default. |
cex.xaxis |
A numerical value giving the amount by which x axis values
should be magnified relative to the default. If
|
plot.sample.names |
Whether to plot sample names below the x axis.
Default is TRUE. Ignored if there are no column names on
|
yaxis.labels |
User defined y axis labels to be plotted. If
|
width , height |
The width and height of the graphics region in inches. |
... |
Other arguments passed to |
Value
An invisible list. The first element is a logical value indicating whether the plot is successful. The second element is a numeric vector giving the coordinates of the bar x-axis midpoints drawn, useful for adding to the graph.
Examples
file <- system.file("extdata",
"Liver-HCC.exposure.csv",
package = "mSigTools"
)
exposure <- read_exposure(file)
plot_exposure_to_pdf(exposure,
file = file.path(tempdir(), "Liver-HCC.exposure.pdf"),
cex.yaxis = 0.8, plot.proportion = TRUE
)
Read an exposure matrix from a file.
Description
Read an exposure matrix from a file.
Usage
read_exposure(file, check.names = FALSE)
Arguments
file |
File path to a CSV file containing an exposure matrix, i.e. the numbers of mutations due to each mutational signature. Each row corresponds to a mutational signature an each column corresponds to a tumor or other biological sample. |
check.names |
Passed to |
Value
Numerical matrix of exposures, with the same
shape as the contents of file
.
Examples
file <- system.file("extdata",
"Liver-HCC.exposure.csv",
package = "mSigTools"
)
exposure <- read_exposure(file)
Compute a matrix of distances / similarities between two sets of signatures.
Description
Compute a matrix of distances / similarities between two sets of signatures.
Usage
sig_dist_matrix(x1, x2, method = "cosine")
Arguments
x1 |
The first set of signatures (a numerical matrix-like object in which each column is a signature). |
x2 |
The second set of signatures, similar data type to |
method |
As for the |
Value
A numeric matrix with dimensions
ncol(x1)
X ncol(x2)
.
Each element represents the distance or
similarity (depending on method
)
between a column in x1
and a column in x2
.
Examples
ex.sigs <- matrix(c(0.2, 0.8, 0.3, 0.7, 0.4, 0.6), nrow = 2)
colnames(ex.sigs) <- c("ex1", "ex2", "ex3")
ref.sigs <- matrix(c(0.21, 0.79, 0.19, 0.81), nrow = 2)
colnames(ref.sigs) <- c("ref1", "ref2")
sig_dist_matrix(ex.sigs, ref.sigs)
Sort columns of an exposure matrix based on the number of mutations in each sample (column).
Description
Sort columns of an exposure matrix based on the number of mutations in each sample (column).
Usage
sort_exposure(exposure, decreasing = TRUE)
Arguments
exposure |
Exposures as a numerical matrix (or data.frame) with signatures in rows and samples in columns. Rownames are taken as the signature names and column names are taken as the sample IDs. |
decreasing |
If |
Value
The original exposure
with columns sorted.
Examples
file <- system.file("extdata",
"Liver-HCC.exposure.csv",
package = "mSigTools"
)
exposure <- read_exposure(file)
exposure.sorted <- sort_exposure(exposure)
Write an exposure matrix to a file.
Description
Write an exposure matrix to a file.
Usage
write_exposure(exposure, file, row.names = TRUE)
Arguments
exposure |
Exposures as a numerical matrix (or data.frame) with signatures in rows and samples in columns. Rownames are taken as the signature names and column names are taken as the sample IDs. |
file |
File to which to write the exposure matrix (as a CSV file). |
row.names |
Either a logical value indicating whether the row names of
|
Value
No return value, called for side effects.
Examples
file <- system.file("extdata",
"Liver-HCC.exposure.csv",
package = "mSigTools"
)
exposure <- read_exposure(file)
write_exposure(exposure, file = file.path(tempdir(), "Liver-HCC.exposure.csv"))