[R] Mismatch distribution

Tue Jan 22 03:52:09 CET 2019

Myriam -

This is the right list in principle, all the packages you use are CRAN packages, not Bioconductor.

However I am at a loss as to how you wrote your code: both pegas and seqinr have "read.<something>()" functions, but neither has read.dna(); similarly both pegas and seqinr have "dist.<something>()" functions, but neither has dist.gene(). Did you just extrapolate those function names and parameters from other function calls?

In any case: please start from a minimal, reproducible example that comes close to what you are trying to achieve, then post again. Here are the three URLs we usually recommend to get things started. Use a small number of small example files, don't nest your expressions until you are sure they produce what you think they do, and take it step by step.

http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
http://adv-r.had.co.nz/Reproducibility.html
https://cran.r-project.org/web/packages/reprex/index.html (read the vignette)

Cheers,
B

-

> On 2019-01-21, at 21:08, Bert Gunter <bgunter.4567 using gmail.com> wrote:
> 
> "Do not work" does not work (in providing sufficient info). See the Posting
> guide  linked below for how to post an intelligible question.
> 
> HOWEVER, I suspect you would do better posting on te Bioconductor list
> where they are much more likely to know what "fasta" files look like and
> might even have software already developed to do what you want. You could
> well be trying to reinvent wheels.
> 
> Cheers,
> Bert
> 
> 
> On Mon, Jan 21, 2019 at 5:35 PM Myriam Croze <myriam.croze07 using gmail.com>
> wrote:
> 
>> Hello!
>> 
>> I need your help. I am trying to calculate the pairwise differences between
>> sequences from several fasta files.
>> I would like for each of my DNA alignments (fasta files), calculate the
>> pairwise differences and then:
>> - 1. Combine all the data of each file to have one file and one histogram
>> (mismatch distribution)
>> - 2. calculate the mean for each difference for all the file and again make
>> a mismatch distribution plot
>> 
>> Here the script that I wrote:
>> 
>> library("pegas")
>>> library("seqinr")
>>> library("ggplot2")
>>> 
>>> 
>> 
>>> Files <- list.files(pattern="fas")
>>> nb_files <- length(Files)
>>> 
>>> 
>>> for (i in 1:nb_files) {
>>>        Dist <-  as.numeric(dist.gene(read.dna(Files[i], "fasta"), method
>>> = "pairwise",
>>>                           pairwise.deletion = FALSE, variance = FALSE))
>>> 
>>>        Data <- merge(Data, Dist, by=c("x"), all=T)
>>>    }
>>> 
>> 
>> 
>>> hist(Data, prob=TRUE)
>>> lines(density(Data), col="blue", lwd=2)
>>> 
>> 
>> However, the script does not work and I do not know what to change to make
>> it working.
>> Thanks in advance for your help.
>> 
>> Myriam
>> 
>> --
>> Myriam Croze, PhD
>> Post-doctorante
>> Division of EcoScience,
>> Ewha Womans University
>> Seoul, South Korea
>> 
>> Email: myriam.croze07 using gmail.com
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.