[BioC] MEDIPS: how does MEDIPS define a methylated region / cluster?

Allen [guest] guest at bioconductor.org
Fri Sep 5 10:14:29 CEST 2014

I have aligned my sequencing data for one sample (filename: Ca2_MAPQ20.bam --- because I filtered out everything with MAPQ20 or less) and I've put it through MEDIPS. At the moment, I am not making a comparison between samples. Therefore, the result file shows the methylation profile for only one sample. In this results file, I noticed that the data is organized as such:
chr	start	stop	CF	Ca2_MAPQ20.bam.counts	Ca2_MAPQ20.bam.rpkm	Ca2_MAPQ20.bam.rms	Ca2_MAPQ20.bam.prob	MSets1.counts.mean	MSets1.rpkm.mean	MSets1.rms.mean	MSets1.prob.mean

1) I tried reading the Down et al. (2008) paper that explains the concept of coupling factor and I think it is, simply put, a measure of local CpG density. I am not sure if my understanding of 'coupling factor' is correct? 
2) This lead to question how or what MEDIPS defines as a "region or cluster"? That is to say, on Chromosome 1, I have reads aligning from position "1002501" to "1003300" but then there is a region of 1400 bp (from "1003301" to "1004700") where there are no reads aligned (rpkm = 0). Then, again, from "1004701" to "1005400", there reads aligning to this region. So my question is does MEDIPS consider this to be a case of 2 methylated regions, that is, "1002501-1003300" and "1004701-1005400" or does MEDIPS try to consolidate these 2 methylated regions into one methylated region/cluster (that is, from "1002501-1005400") since they are fairly close to one another.  
3) I used "uniq=TRUE" to get rid of stacked/clonal reads and so for each 100 bp bin, I am usually getting rpkm=1 within each bin but occasionally, it may be as many rpkm=3. This lead me to wonder how the relative methylation score ("Ca2_MAPQ20.bam.rms") is calculated? For example, 3 bins have a value of 1 rpkm each, but one has an rms value of "1660.409928", another an rms of "7122.10254" and the third is only "679.1814523". How is that possible that 3 bins that have the same rpkm can have such varying rms values?
4)Can the rms be added up for a region so as to represent the cumulative methylation level for that region. I am asking because I do not know if the rms value is a log value or what? So, in my above question (3), if the 3 bins are adjacent to one another and I decide to cluster them together and consider them as one methylated region, then would the rms value for this new 300bp bin that I have created be 1660.409928 + 7122.10254 + 679.1814523 = 9461.68.
5)Finally, what does "Ca2_MAPQ20.bam.prob" represent? I figured that is some sort of a probability score but I am not sure of what.

Thanks in advance for any assistance in answering my questions. 

Best regards,

 -- output of sessionInfo(): 


Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list