[BioC] Random sequence generator (Off Topic)

Oliver Bembom bembom at gmail.com
Fri Nov 17 06:21:09 CET 2006


Hi Joo Sang,

The cosmo package contains a function, rseq(), that allows you to
generate random DNA sequences according to a kth order Markov model.
The 0th order Markov model would simply generate each nucleotide
according to frequencies you supply, regardless of what nucleotides
were observed in previous positions. A kth order Markov model allows
the probability of observing a given nucleotide to depend on the
previous k positions.

The function is designed to insert occurrences of a motif into these
sequences, either according to the zero-or-one-occurrense-per-sequence
(ZOOPS) model or the two-component-mixture (TCM) model that allows any
number of motif occurrences per sequence. These motifs are specified
through their position weight matrix. If you wanted to simulate
sequences without any motif occurrences, you could set the rate
parameter of the ZOOPS model to 0, for example.

The main function of the package, cosmo(), searches DNA sequences for
a shared motif; the output contains among other things an estimate of
the transition matrix for the background model so you could use this
function to estimate the transition matrix you would then feed to
rseq().

The package was just submitted to Bioconductor and is included in
Bioconductor 2.0. The version posted there is intended for R 2.5. If
you're a Linux/Unix user, you can still install it with

>install.packages("cosmo", rep="http://bioconductor.org/packages/2.0/bioc/")

You might have to install the tkWidgets package first if you don't have it:

>source("http://bioconductor.org/biocLite.R")
>biocLite("tkWidgets")

If you need a Windows binary, you can go to

http://cosmoweb.berkley.edu/software.html

The version posted there (1.0.1) is also slightly newer than the
Bioconductor version 1.0.0. I will update the Bionconductor package
soon.

Oliver



On 11/16/06, Joo Sang Lee <joosang at northwestern.edu> wrote:
> Good evening.
>
> This question is maybe off-topic. I am sorry for this spam for most of you.
>
> I am studying the periodicity of DNA sequence. The approach I take is to
> investigate the frequency of specific motif and distance between two
> succesive occurrence of the motif. To acquire the frequency relative to
> random sequence, I need to generate a random sequence with the same
> composition of bases as original sequence. I am looking for the public
> software to help me in calculating the relative frequency of getting random
> sequences. I look forward to any of comments or  advice on this problem.
> Thank you very much.
>
> Best regards,
>
> Joo Sang Lee
> Department of Physics,
> Northwestern University
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list