[BioC] Coverage vs GC bias - how to?
mailinglist.honeypot at gmail.com
Tue Sep 7 23:22:58 CEST 2010
Just a few comments:
On Tue, Sep 7, 2010 at 4:59 AM, Marc Noguera <mnoguera at imppc.org> wrote:
> Hi all,
> I am trying to get some insight knowledge on some NGS data. I have an
> alignment file and an AlignedRead object from it, from which I can
> compute coverage.
> What I would like to know is if there is any bias on the coverage vs GC
> My idea is to run a sliding window of width N through the reference
> sequence I have the reads on and to compute, somehow, a kind of coverage
> coefficient from the coverage i get from IRanges and the AlignedRead object.
For some ideas on performing sliding-window stuff over Rle's, read
through this thread:
> As I know the coordinate of the sliding window the next step is to
> obtain, somehow, the sequence of the window and calculate the GC content.
> How could I extract the sequence from BSgenome within this sliding
> window and how can I create the sliding window. I see that I can use
> getSeq to obtain the sequence and the use alphabetByFrequency on it.
While getSeq might work (I really never use that function), I think
I'd rather use "Views", like so:
R> chr <- unmasked(Hsapiens$chr10)
R> ranges <- IRanges(start=c(100000, 101000, 102000), width=100)
R> seqs <- Views(chr, ranges)
A C G T M R W S Y K V H D B N - +
[1,] 18 26 23 33 0 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 30 26 24 20 0 0 0 0 0 0 0 0 0 0 0 0 0
[3,] 38 23 18 21 0 0 0 0 0 0 0 0 0 0 0 0 0
> However, I don't know how to create this sliding window.
> Which is the best way to proceed? am I reinventing the wheel?
Maybe, but it's good practice to learn how to do anyway ;-)
There have been several papers published about biases found in
sequencing data, which you should probably read through. I think it's
good to know how to do these things so that you can actually see which
of the reported phenomena are happening in the dataset you're working
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor