[BioC] Regarding multiple hits of same read

Steve Lianoglou lianoglou.steve at gene.com
Tue Sep 24 18:30:15 CEST 2013


Hi,

On Tue, Sep 24, 2013 at 4:25 AM, deepika lakhwani
<lakhwanideepika at gmail.com> wrote:
> Hello,
>
> i have been trying to find out differential expression of gene using
> different R packages. I have rice illumina sequencing data (pair end) with
> 100 bp. i mapped the data n rice genome using tophat now i got
> accepted_hit. bam file in which details of mapping is available.
>
> Now i am confused because it can be possible that a single read can align
> on multiple position.

One way to deal with reads that align to multiple (genomic) positions
is to not deal with them at all. Many people only use reads that align
uniquely to the genome.

> When we count the reads for differential analysis
> then same read is present in two different genes.

This is different than what you mention above.

It is possible that:

(1) One read aligns to multiple places in the genome. These reads are
often called "multimapped" (multimappers, etc.) and as I mentioned
above, it is rather common to ignore these and to only count reads
that align to a unique position in the genome.

(2) It is possible for two different genes to share the same genomic
locus as each other, so even though a read maps to one position in the
genome, there is more than one gene that it can be assigned to.

> So i have a question that
> is correct or not?

Can you clarify in greater detail what you are asking "correctness" for?

> and i am reading genomic features R package for counting
> the reads in libraries. Can anyone explain the summarizeOverlaps function?

Please read through the copious documentation made available in the
GenomicRanges package:

http://bioconductor.org/packages/2.12/bioc/html/GenomicRanges.html

There are five PDF files available there under the "Documentation"
section and all of them are worth your close attention.

If you still have more specific questions after reading through those,
please ask those specific ones here. A generic question like "explain
the summarizeOverlaps" function isn't helpful, as it is explained in
multiple places in the documentation -- if there is something specific
about it that is confusing, we can help you to address that.

> i read manual but what is basic function of it.

So what part is unclear?

You'll likely also want to read through the vignette for the
parathyroidSE package:

http://bioconductor.org/packages/release/data/experiment/vignettes/parathyroidSE/inst/doc/parathyroidSE.pdf

It shows in great detail how to go from aligned reads to "counted"
genes and exons.

HTH,
-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list