[R] Mismatch distribution

Tue Jan 22 04:27:38 CET 2019

Thanks for your answer.

First, concerning the function read.dna and dist.gene, they come from the
package ape which is downloaded with pegas.

Here the code that I did for one sequence and which works:

##Code
Seqs1 <- "file1.fas"
Seqs2 <- read.dna(Seqs1, "fasta")

Dist <- dist.gene(Seqs2, method = "pairwise", pairwise.deletion = FALSE,
variance = FALSE)
Dist2 <- as.numeric(Dist)

hist(Dist2, prob=TRUE)
##

And then the code for several files:

#######
Files <- list.files(pattern="fas")
nb_files <- length(Files)
Data1 <- as.numeric()

for (i in 1:nb_files) {
  Seqs <- read.dna(Files[i], "fasta")

  Dist <- dist.gene(Seqs, method = "pairwise", pairwise.deletion = FALSE,
variance = FALSE)
  Dist <-  as.numeric(Dist)

  Data1 <- merge(Data1, Dist)
    }

hist(Data1, prob=TRUE)
########

In the last code, the file Data1 (where I want all the data from the 3
files) is empty at the end. I guess something is missing in this last step
or maybe should I use another function.

Cheers,
Myriam

Le mar. 22 janv. 2019 à 11:52, Boris Steipe <boris.steipe using utoronto.ca> a
écrit :

> Myriam -
>
> This is the right list in principle, all the packages you use are CRAN
> packages, not Bioconductor.
>
> However I am at a loss as to how you wrote your code: both pegas and
> seqinr have "read.<something>()" functions, but neither has read.dna();
> similarly both pegas and seqinr have "dist.<something>()" functions, but
> neither has dist.gene(). Did you just extrapolate those function names and
> parameters from other function calls?
>
> In any case: please start from a minimal, reproducible example that comes
> close to what you are trying to achieve, then post again. Here are the
> three URLs we usually recommend to get things started. Use a small number
> of small example files, don't nest your expressions until you are sure they
> produce what you think they do, and take it step by step.
>
>
> http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
> http://adv-r.had.co.nz/Reproducibility.html
> https://cran.r-project.org/web/packages/reprex/index.html (read the
> vignette)
>
> Cheers,
> B
>
> -
>
>
>
> > On 2019-01-21, at 21:08, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >
> > "Do not work" does not work (in providing sufficient info). See the
> Posting
> > guide  linked below for how to post an intelligible question.
> >
> > HOWEVER, I suspect you would do better posting on te Bioconductor list
> > where they are much more likely to know what "fasta" files look like and
> > might even have software already developed to do what you want. You could
> > well be trying to reinvent wheels.
> >
> > Cheers,
> > Bert
> >
> >
> > On Mon, Jan 21, 2019 at 5:35 PM Myriam Croze <myriam.croze07 using gmail.com>
> > wrote:
> >
> >> Hello!
> >>
> >> I need your help. I am trying to calculate the pairwise differences
> between
> >> sequences from several fasta files.
> >> I would like for each of my DNA alignments (fasta files), calculate the
> >> pairwise differences and then:
> >> - 1. Combine all the data of each file to have one file and one
> histogram
> >> (mismatch distribution)
> >> - 2. calculate the mean for each difference for all the file and again
> make
> >> a mismatch distribution plot
> >>
> >> Here the script that I wrote:
> >>
> >> library("pegas")
> >>> library("seqinr")
> >>> library("ggplot2")
> >>>
> >>>
> >>
> >>> Files <- list.files(pattern="fas")
> >>> nb_files <- length(Files)
> >>>
> >>>
> >>> for (i in 1:nb_files) {
> >>>        Dist <-  as.numeric(dist.gene(read.dna(Files[i], "fasta"),
> method
> >>> = "pairwise",
> >>>                           pairwise.deletion = FALSE, variance = FALSE))
> >>>
> >>>        Data <- merge(Data, Dist, by=c("x"), all=T)
> >>>    }
> >>>
> >>
> >>
> >>> hist(Data, prob=TRUE)
> >>> lines(density(Data), col="blue", lwd=2)
> >>>
> >>
> >> However, the script does not work and I do not know what to change to
> make
> >> it working.
> >> Thanks in advance for your help.
> >>
> >> Myriam
> >>
> >> --
> >> Myriam Croze, PhD
> >> Post-doctorante
> >> Division of EcoScience,
> >> Ewha Womans University
> >> Seoul, South Korea
> >>
> >> Email: myriam.croze07 using gmail.com
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Myriam Croze, PhD
Post-doctorante
Division of EcoScience,
Ewha Womans University
Seoul, South Korea

Email: myriam.croze07 using gmail.com

	[[alternative HTML version deleted]]