Help for package orthGS

Title:

Orthology vs Paralogy Relationships among Glutamine Synthetase from Plants

Version:

0.1.8

Description:

Tools to analyze and infer orthology and paralogy relationships between glutamine synthetase proteins in seed plants.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

RoxygenNote:

7.3.2

Depends:

R (≥ 2.10)

LazyData:

true

Imports:

ape, bio3d, castor, igraph, phangorn, phytools, seqinr, TreeTools

Suggests:

BiocManager, Biostrings, fs, knitr, muscle, rmarkdown, testthat

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-04-30 06:44:52 UTC; JCA

Author:

Elena Aledo [aut, cre, cph], Juan-Carlos Aledo

[aut]

Maintainer:

Elena Aledo <elenaaledoesteban@gmail.com>

Repository:

CRAN

Date/Publication:

2025-04-30 08:50:02 UTC

Adjacency Matrix for Orthology Graph

Description

155 x 155 square matrix (155 GS proteins from 45 seed plant species)

Usage

A_selected

Format

A matrix with 155 rows and 155 columns

Source

It has been generated using the function orthG::mapTrees() and the reconciliation output file 'selected'. Verbigracia: orthG::mapTrees('./inst/extdata/selected') The reconciliation was carried out using RANGER-DTL with parameters D = 1, T = 10 and L = 1.

Angiosperms Gymnosperms

Description

Angiosperms Gymnosperms

Usage

AngGym

Format

A dataframe with 155 rows (GS proteins) and 10 columns:

n: Reference number
phylo_id: Unique identification label of the protein/gen
species: Species
taxon: Acrogymnospermae or Angiospermae
class: Angiosperms: Amborellopsida, Liliopsida, Magnoliopsida; Gymnosperms: Ginkgoopsida, Cycadopsida, Gnetopsida, Pinopsida
dna: CDS sequence
prot: Protein sequence
short: Unique three letter identification of the species
gsLineage: Either GS2, GS1a or GS1b
plant_group: Primitive angiosperms, Modern angiosperms, Ginkgo-Cycadales, Gnetales, Pinacea, Conifer II

Source

It has been manually curated by the authors

Angiosperms Gymnosperms Ferns

Description

Angiosperms Gymnosperms Ferns

Usage

agf

Format

A dataframe with 275 rows (GS proteins) and 23 columns:

n: Reference number
phylo_id: Unique identification label of the protein/gen
species: Species
taxon: Acrogymnospermae, Angiospermae, Polypodiopsida
dna: CDS sequence
prot: Protein sequence
short: Unique three letter identification of the species
gs: GS2, GS1a or GS1b_Ang, GS1b_Gym
pI: isoelectric point
factor: Ferns, GS2, GS1a, GS1b_Ang, GS1b_Gym
size: number of residues
CSpos: position signal
prediction: prediction
Lk_SP: seq pep
Lk_mTP: mit
Lk_cTP: chl
Lk_Thylak: thy
secAa: amino acid at position 2
core: core
dabase: db
acc: acc
up_id: uniprot
note: note

Source

It has been manually curated by the authors

Colouring Tree Tips

Description

Make a color vector for colouring tree tips

Usage

coltips(phy)

Arguments

phy

tree as a phylo object

Details

Each tip is given a color according to the nature of the isoform: green (GS2), blue (GS1a), brown (GS1b Gym), salmon (GS1b Ang), purple (other).

Value

a color vector as long as the number of tips

Examples

coltips(ape::read.tree(text = "((Bdi, Sly), (Pp, Ap));"))

Remove Gaps in a MSA

Description

Removes gaps in a given msa.

Usage

gapless_msa(msa, seqtype = 'AA', df = TRUE, sfile = FALSE)

Arguments

msa

input alignment.

seqtype

the nature of the sequences: 'DNA' or 'AA'.

df

logical. When TRUE msa should be a matrix, when FALSE msa should be a string giving the path to a fasta file containing the alignment.

sfile

if different to FALSE, then it should be a string indicating the path to save a fasta alignment file.

Details

It should be noted that this function does not carry out the alignment itself.

Value

an alignment without gaps in form of matrix or a file containing such an alignment in fasta format.

Examples

## Not run: gapless_msa(msa(sequences = c("APGW", "AGWC", "CWGA"),ids = c("a", "b", "c"))$ali)

Get the GS Sequence

Description

Provides the requested GS sequence

Usage

getseqGS(phylo_id, molecule = "Prot")

Arguments

phylo_id

the unique sequence identifier

molecule

either "Prot" or "CDS"

Details

The identifier should be one of the 'phylo_id' from data(agf).

Value

The requested sequence as a character string.

Examples

getseqGS("Pp_GS1b_2")

Find The Root of a Phylogenetic Tree Using MAD Method

Description

Finds the root of an unrooted phylogenetic tree by minimizing the relative deviation from the molecular clock.

Usage

madRoot(tree, output_mode = 'phylo')

Arguments

tree

unrooted tree string in newick format or a tree object of class 'phylo'.

output_mode

amount of information to return. If 'phylo' (default) only the rooted tree is returned. If 'stats' also a structure with the ambiguity index, clock cv, the minimum ancestor deviation and the number of roots. If 'full' also an unrooted tree object, the index of the root branch, the branch ancestor deviations and a rooted tree object.

Details

This function is a slight modification of the code provided by Tria et al at https://www.mikrobio.uni-kiel.de/de/ag-dagan/ressourcen.

Value

a rooted tree and supplementary information if required.

Author(s)

Tria, F. D. K., Landan, G. and Dagan, T.

References

Tria, F. D. K., Landan, G. and Dagan, T. Nat. Ecol. Evol. 1, 0193 (2017).

Examples

## Not run: a <- msa(sequences=c("RAPGT", "KMPGT", "ESGGT"), ids = letters[1:3])$ali
rownames(a) <- letters[1:3]
tr <- mltree(a)$tree
rtr <- madRoot(tr)
## End(Not run)

Map Gene Tree into Species Tree

Description

Maps a gene/protein tree into a species tree

Usage

mapTrees(path2rec)

Arguments

path2rec

path to the file containing the reconciliation output.

Details

Mapping gene tree into species tree allow to infer the sequence of events (Duplication, Speciation, Transfer).

Value

A list with three elements. The first one is a 'phylo' object where the nodelabels indicate the event: D, duplication or T transfer. If no label is shown is because the event correspond to speciation. The second element is a dataframe (the first column is the label of the internal nodes in the gene tree; the second column is the label of the internal nodes in the species tree, and the third and fourth columns label each internal node according to the inferred event). The third element of the list is an adjacency matrix: 1 when two proteins are orthologous, 0 if they are paralogous.

Examples

mapTrees(fs::path_package("extdata", "representatives", package = "orthGS"))

Build Up a ML Tree

Description

Given an alignment builds an ML tree.

Usage

mltree(msa, df = TRUE, gapl = TRUE, model = "WAG")

Arguments

msa

input alignment.

df

logical. When TRUE msa should be a dataframe, when FALSE msa should be a string giving the path to a fasta file containing the alignment.

gapl

logical, when TRUE a gapless alignment is used.

model

allows to choose an amino acid models (see the function phangorn::as.pml)

Details

The function makes a NJ tree and then improve it using an optimization procedure based on ML.

Value

a ML optimized tree (and parameters)

Examples

## Not run: a <- msa(sequences=c("RAPGT", "KMPGT", "ESGGT"), ids = letters[1:3])$ali
rownames(a) <- letters[1:3]
tr <- mltree(a)$tree
## End(Not run)

Multiple Sequence Alignment

Description

Aligns multiple protein, DNA or CDS sequences using inhouse software.

Usage

msa(sequences, ids = names(sequences), seqtype = "prot", method, sfile = FALSE)

Arguments

sequences

vector containing the sequences as strings.

ids

character vector containing the sequences' ids.

seqtype

it should be either "prot" of "dna" or "cds" (see details).

method

the software to be used for the alignment, as invoked in your system. For instance, "muscle3" or "clustalo".

sfile

if different to FALSE, then it should be a string indicating the path to save a fasta alignment file.

Details

Either Clustal Omega or MUSCLE must be installed, and their executable be in your system's PATH. If seqtype is set to "cds" the sequences must not contain stop codons and they will be translated using the standard code. Afterward, the amino acid alignment will be used to lead the codon alignment.

Value

Returns a list of four elements. The first one ($seq) provides the sequences analyzed, the second element ($id) returns the identifiers, the third element ($aln) provides the alignment in fasta format and the fourth element ($ali) gives the alignment in matrix format.

Examples

## Not run: msa(sequences = c("APGW", "AGWC", "CWGA"),
                             ids = c("a", "b", "c"))
## End(Not run)

Infer GS OrthoGroups Within a Set of Species

Description

Infers GS orthogroups using tree reconciliation

Usage

orthG(set = "all")

Arguments

set

set of species of interest provided as a character vector either with the binomial or short code of the species (see data(sdf)).

Details

When set = "all", all the species in the database will be included.

Value

A list with two elements. The first one is the adjacency matrix (1 for orthologous, 0 for paralogous). The second element is an orthogroup graph.

Examples

orthG(set = c("Pp", "Psy", "Psm", "Ap"))

Search Orthologous of a Given Protein

Description

Searches orthologous of a given protein within a set of selected species

Usage

orthP(phylo_id, set = "all")

Arguments

phylo_id

phylo_id of the query protein

set

set of species of interest provided as a character vector, either with the binomial or short code of the species (see details).

Details

When set = "all", the search will be carry out against all the species in the database.

Value

A list with thee elements: 1. subtree of the relevant proteins; 2. vector color; 3. phylo_ids of the orthologous found.

Examples

orthP(phylo_id = "Pp_GS1a", set = c("Pp", "Psy", "Psm", "Ap"))

Infer OrthoGroups Using Tree Reconciliation

Description

Infer orthogroups using species and gene trees reconciliation

Usage

orthology(trees, invoke, d = 2, t = 10, l = 1, plot = TRUE, saverec = FALSE)

Arguments

trees

path to a single file containing first the species tree, followed by a single gene/protein tree (see details).

invoke

character string representing the way in which the executable of RANGER-DTL (see details) is invoked.

d

cost assigned to gene duplication.

t

cost assigned to gene transfer.

l

cost assigned to gene loss.

plot

when TRUE, the orthology network graph is plotted.

saverec

path to the directory where to save the reconciliation file. If not provided the file is not saved (default)

Details

The executable of RANGER-DTL (https://compbio.engr.uconn.edu/software/RANGER-DTL) should be installed. All input trees must be expressed using the Newick format terminated by a semicolon, and they must be fully binary (fully resolved) and rooted. Species names in the species tree must be unique. E.g, E.g., (((speciesA_gene1, speciesC_gene1), speciesB_geneX), speciesC_gene2); and (((speciesA, speciesC), speciesB), speciesC); are both valid gene tree inputs and, in fact, represent the same gene tree. This gene tree contains one copy of the gene from speciesA and speciesB, and two copies from speciesC.

Value

A list with four elements. The first one is a 'phylo' object where the nodelabels indicate the event: D, duplication or T transfer. If no label is shown is because the event correspond to speciation. The second element is a dataframe (the first column is the label of the internal nodes in the gene tree; the second column is the label of the internal nodes in the species tree, and the third and fourth columns label each internal node according to the inferred event). The third element of the list is an adjacency matrix: 1 when two proteins are orthologous, 0 if they are paralogous. The last element of the list is an orthogroup graph.

Examples

orthology(trees = system.file("extdata", "input.trees", package = "orthGS"))

Seed Plants and Ferns GS

Description

155 GS proteins from 25 seed plants species and 41 GS proteins from 11 fern species

Usage

sdf

Format

A dataframe with 196 rows (GS proteins) and 7 columns:

n: Reference number
Sec.Name_: Unique identification label of the protein
species: Species
taxon: Acrogymnospermae, Angiospermae or Polypodiopsida
short: Unique three letter identification of the species
gs: Either GS2, GS1a, GS1b_Gym or GS1b_Ang. Here the ferns proteins have been forced to be either GS1a or GS2
tax_group: Taxonomic group

Source

It has been curated manually by the authors

Ultrametric Rooted Seed Plants Tree

Description

155 GS proteins from 45 seed plants species Rooted using MAD (Minimal Ancestor Deviation)

Usage

selected_tr

Format

An phylo object

Source

It has been manually curated by the authors

Map Species Names

Description

Map binomial species name to short code species name and vice versa

Usage

speciesGS(sp)

Arguments

sp

set of species of interest (either binomial or short code name)

Details

The species set should be given as a character vector (see example)

Value

A dataframe containing the information for the requested species.

Examples

speciesGS(c("Pinus pinaster", "Ath"))

GS Proteins Report

Description

Assembles a report regarding the GS proteins found in the indicated subset of species

Usage

subsetGS(sp)

Arguments

sp

set of species of interest (either binomial or short code name)

Details

This function returns the protein and DNA sequences of the different isoforms found in each species, along with other relevant data.

Value

A dataframe with the information for the requested species.

Examples

subsetGS(c("Pinus pinaster", "Ath"))

Adjacency Matrix for Orthology Graph

Description

Usage

Format

Source

Angiosperms Gymnosperms

Description

Usage

Format

Source

Angiosperms Gymnosperms Ferns

Description

Usage

Format

Source

Colouring Tree Tips

Description

Usage

Arguments

Details

Value

Examples

Remove Gaps in a MSA

Description

Usage

Arguments

Details

Value

See Also

Examples

Get the GS Sequence

Description

Usage

Arguments

Details

Value

Examples

Find The Root of a Phylogenetic Tree Using MAD Method

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Map Gene Tree into Species Tree

Description

Usage

Arguments

Details

Value

Examples

Build Up a ML Tree

Description

Usage

Arguments

Details

Value

See Also

Examples

Multiple Sequence Alignment

Description

Usage

Arguments

Details

Value

Examples

Infer GS OrthoGroups Within a Set of Species

Description

Usage

Arguments

Details

Value

Examples

Search Orthologous of a Given Protein

Description

Usage

Arguments

Details