[BioC] Query

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Wed Feb 24 18:38:43 CET 2010


I would say at this point that having a problem and then asking a mailing list how to solve it does not constitute a PhD :)

I hope you attribute the code below to Martin in your thesis.

Like many people on this list, I learned R all by myself, and the ability to learn new technologies is a valuable skill you will need when you become a researcher.  Mailing lists are there to help, but one would hope you'd spent a few weeks trying to do it yourself in R before asking the mailing list :)
________________________________________
From: bioconductor-bounces at stat.math.ethz.ch [bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Martin Morgan [mtmorgan at fhcrc.org]
Sent: 24 February 2010 14:53
To: Yogesh Kumar
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Query

On 02/23/2010 09:43 PM, Yogesh Kumar wrote:
> Respected Sir/Madam,
>
> Good morning, I just started my PhD in Next generation sequencing datasets..
> I have one query , can you help me to handle this query.
>
> we  have sequence of 1MB region from Human chr 22 (haploid). we want to cut
> the length of max 450-500bp  ( given use Normal distribution, Mean 500 and
> SD=?). *I want to know how can we generate reads from our sequence*   (
> paired end of 75bp at both ends, Assuming  perfect reads). we want to do
> simulation of NGS run and reassemble it  again  without using reference
> sequence. Can we use R for simulation work also? if yes please provide me
> which package we can use for it and possible send me R script too.

Good morning --

>From here

  http://bioconductor.org/packages/release/BiocViews.html

The BSgenome software package and the appropriate BSgenome annotation
(e.g., BSgenome.Hsapiens.UCSC.hg19) would be a good starting point for
reference sequence. The Biostrings and IRanges software packages,
especially 'Views', coupled with standard R random number generators
e.g., rnorm would be a good place to simulate reads. These could be
written as fasta files. Basic code might be

  library(BSgenome.Hsapiens.UCSC.hg19)
  N <- 1e8 # 10M reads
  start <- 5e7 + runif(N, 0, 1e7) # start of fragment, genome coords
  width <- rnorm(N, 500, 50) # width of fragment
  v <- Views(Hsapiens[["chr1"]], start=start, width=width)

Each element of v is a fragment

  r1 <- narrow(v, 1, 80)
  r2 <- narrow(v, width(v) - 80 + 1, width(v))

r1 and r2 are the pairs of each read. The 10M reads didn't quite fit in
my 8Gb of RAM, so likely I'd need to generate reads in batches of, e.g.,
5M reads.

Good luck.

Martin

>
> 1. To establish simulation of reassembling sequence from NGS data. This will
> build from re-assembling a simple sequence of 1 Mb with no repeats in the
> haploid state, to inclusion of genetic variation and polyploidy.
>        -simulate a NGS run from a 1 Mb segment of human with little/no
> repeats. Average fragment size 500 bp with normal distribution. Paired end
> with 75 bp reads. Assume perfect sequencing. Check out other simulation
> methods
>        - align the reads back to the 1 Mb sequence. How much variation in
> coverage
>        - reassemble the reads WITHOUT using the reference sequence.
>
>
>
>
> http://maq.sourceforge.net/maq-manpage.shtml#7
>
> I think we can use above link but not be able to work out . can you check it
> and send me solution
>
> Regards
> Yogesh Kumar
> University of Otago
> Dunedin, NZ
> 0064226149242
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list