[BioC] How does MEDIPS handle multiple mapping of a read
davisjwa at health.missouri.edu
Tue Aug 26 18:28:24 CEST 2014
It seems there are two somewhat related issues that are relevant here: multi-mapped reads (i.e., NOT uniquely MAPPABLE read) and duplicate reads (i.e., not the only read with a given same start/stop). I like to avoid the term "unique" by itself because it can be ambiguous as you can see.
To my knowledge, MEDIPS doesn't address the multi-mapped reads; I think many would argue that is something best handled by the aligner or afterward by filtering your bam files based on certain flags. Specifically, it is more efficient for the aligner to implement whatever you desire (e.g., spreading them around uniformly or based on some estimated probability) during alignment than for MEDIPS to do it.
However, MEDIPS does deal with duplicate reads via the uniq flag, as noted in the vignette:
"MEDIPS will replace all reads which map to exactly the same start and end positions on the same strand by only one representative:
Most of what I have read about for ChiP/MeDIP lead us to only count duplicate reads once (i.e., use uniq=TRUE). We looked at a number of samples analyzed both ways and looked at the Irreproducible Discovery Rate (IDR) plots and found better reproducibility using this approach. Another paper suggests using the duplicates or discarding them depending on the purpose:
I hope this helps a little,
From: Allen [guest] [mailto:guest at bioconductor.org]
Sent: Tuesday, August 26, 2014 4:50 AM
To: bioconductor at r-project.org; patcksa at nus.edu.sg
Cc: MEDIPS Maintainer
Subject: [BioC] How does MEDIPS handle multiple mapping of a read
I was wondering if someone can tell me what happens in MEDIPS a particular read maps to multiple locations on the genome? I have just aligned my MeDIPS-seq data using BWA and only about 40% have a mapping quality of above 30. Does MEDIPS only take into account unique reads and throws out the rest?
I looked at the MEDIPS manual (http://www.bioconductor.org/packages/release/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf) and can't find any information on this.
I read on various forums and many people suggest just using unique reads. However, for ChIP-seq, it is suggested that one could use CSEM so that information from multiple mapping is not completely lost.
Thanks in advance for your help.
-- output of sessionInfo():
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor