[BioC] How does MEDIPS handle multiple mapping of a read

Lukas Chavez lukas.chavez.mailings at googlemail.com
Tue Aug 26 21:50:39 CEST 2014


Dear Allen and Wade,

indeed "multiple mappers" and reads that align to exactly the same position
(sometimes named "clonal" or "stacked" reads) are two different things and
need to be discussed separately. Wade has given a great overview about
"stacked reads" by referring to the uniq=TRUE parameter of the MEDIPS
package, sharing his results with the IDR, and by pointing to a study that
further discusses this topic. Here, I would like to add that some tools
provide the opportunity to estimate a global or local maximum of "stacked"
reads. MEDIPS currently only allows for considering  one representative per
genomic position and strand or all reads. It is somewhere on my ToDo list
to add a third option allowing a certain maximum of stacked reads.

Reads that align to several positions in the genome ("multiple mapper") are
common in MeDIP experiments, because of high abundance of methylation at
repetitive regions. MEDIPS uses functions of the Rsamtools package to
import mapping results from a bam file. Personally, I do not have
experience using bam files including multiple alignments per read, because
I am typically reporting only mapping results for unique mappers. However, i
f a bam file contains multiple mappers, all hits will be imported into
MEDIPS. This behavior is determined by the internal MEDIPS function
getGRange(). The relevant line is

scanFlag = scanBamFlag(isUnmappedQuery = F)

Here, all unmapped reads are neglected in the scanBamFlag constructor.
Among others, the scanBamFlag constructor has the parameter
isNotPrimaryRead which is NA by default and the parameter is not changed by
MEDIPS. Consequently, all alignments are returned regardless of their
primary status. More information on the primary status of an alignment (
http://www.bioconductor.org/packages/release/bioc/manuals/Rsamtools/man/Rsamtools.pdf
):

As far as I remember, I previously encountered problems when importing CSEM
alignment results into MEDIPS, but this is a while ago. What you can always
do is to convert your mapping results/ bam file into a plain bed file,
regardless of the presence of multiple alignments per read and regardless
of the appropriate use of flags of any alignment tool. MEDIPS will then
consider each given entry as a mapping hit and calculate the genomic
coverage at genome wide windows accordingly.

Best regards,
Lukas



On Tue, Aug 26, 2014 at 9:28 AM, Davis, Wade <davisjwa at health.missouri.edu>
wrote:

> Hi Allen,
> It seems there are two somewhat related issues that are relevant here:
> multi-mapped reads (i.e., NOT uniquely MAPPABLE read) and duplicate reads
> (i.e., not the only read with a given same start/stop). I like to avoid the
> term "unique" by itself because it can be ambiguous as you can see.
>
> To my knowledge, MEDIPS doesn't address the multi-mapped reads; I think
> many would argue that is something best handled by the aligner or afterward
> by filtering your bam files based on certain flags. Specifically, it is
> more efficient for the aligner to implement whatever you desire (e.g.,
> spreading them around uniformly or based on some estimated probability)
> during alignment than for MEDIPS to do it.
>
> However, MEDIPS does deal with duplicate reads via the uniq flag, as noted
> in the vignette:
> "MEDIPS will replace all reads which map to exactly the same start and end
> positions on the same strand by only one representative:
> > uniq=TRUE"
>
> See ?MEDIPS.createSet
>
> Most of what I have read about for ChiP/MeDIP lead us to only count
> duplicate reads once (i.e., use uniq=TRUE). We looked at a number of
> samples analyzed both ways and looked at the Irreproducible Discovery Rate
> (IDR) plots and found better reproducibility using this approach. Another
> paper suggests using the duplicates or discarding them depending on the
> purpose:
>
> http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003326
>
> I hope this helps a little,
> Wade
>
> -----Original Message-----
> From: Allen [guest] [mailto:guest at bioconductor.org]
> Sent: Tuesday, August 26, 2014 4:50 AM
> To: bioconductor at r-project.org; patcksa at nus.edu.sg
> Cc: MEDIPS Maintainer
> Subject: [BioC] How does MEDIPS handle multiple mapping of a read
>
> Hi,
> I was wondering if someone can tell me what happens in MEDIPS a particular
> read maps to multiple locations on the genome? I have just aligned my
> MeDIPS-seq data using BWA and only about 40% have a mapping quality of
> above 30. Does MEDIPS only take into account unique reads and throws out
> the rest?
>
> I looked at the MEDIPS manual (
> http://www.bioconductor.org/packages/release/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf)
> and can't find any information on this.
>
> I read on various forums and many people suggest just using unique reads.
> However, for ChIP-seq, it is suggested that one could use CSEM so that
> information from multiple mapping is not completely lost.
>
> Thanks in advance for your help.
>
> Allen
>
>
>
>  -- output of sessionInfo():
>
> error
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list