[BioC] MEDIPS: how does MEDIPS define a methylated region / cluster?

Lukas Chavez lukas.chavez.mailings at googlemail.com
Mon Sep 8 19:09:29 CEST 2014


Dear Allen,

I was referring to section 7.7 of the MEDIPS Tutorial (or vignette)
available at:
http://www.bioconductor.org/packages/release/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf

I recommend to apply the a PCA to the counts at the genomic windows
returned by the MEDIPS.meth() function where you probably want to remove
windows with zero or low counts first. Here, the assumption is that CpG
density correction of MeDIP signals might not be necessary as you are
comparing different samples with each other (please see section 7.7 of the
MEDIPS tutorial). In case you want to apply a PCA on CpG density
methylation estimates anyway, you can either use the rms values or
calculate methylation estimates using  BayMeth, a recent tool that appears
to be superior for such methylation estimates.

All the best,
Lukas

On Sun, Sep 7, 2014 at 11:10 AM, Chong Kim San Allen <patcksa at nus.edu.sg>
wrote:

> Dear Lukas,
> Again, I am so grateful for your quick reply.
>
> The only manual for MEDIPS that I know of is this one:
> http://www.bioconductor.org/packages/release/bioc/manuals/MEDIPS/man/MEDIPS.pdf
> and I have looked and just can't find the section 7.7 that you mentioned.
>
> The reason I asked about how MEDIPS defines a cluster is because I wanted
> to try and reduce the number of methylated regions that I look at and then
> use these "consolidated" regions to do a principal component analysis
> (PCA). I have 12 cell types and I wanted to see if I could either do a PCA
> or Unsupervised Hierarchical Clustering to separate these 12 cell types
> into subgroups, based on their methylation profile.
>
> You said: "When windows with significant differential enrichment between
> groups have been identified, the MEDIPS.mergeFrames() function can be used
> to merge adjacent significant windows into extended regions."
> But at this point, I don't have any groupings. Normally, one would have a
> control and test group and so you can do what you suggested. But I have
> only, for example, a test group and I want to know if this test group can
> be further subdivided into a few subgroups based on their methylation
> profile. I would think that it would be computationally intensive if I used
> original 100bp windows generated by MEDIPS and entered all the rms values
> for each window of each cell type into the PCA.
>
> You said: "Instead of merging any values across windows, I recommend to
> make use of the MEDIPS.createROIse() function which allows for analyzing
> specific regions of interest."
> I didn't want my PCA analysis to be biased by a predefined "region of
> interest", be it whether the ROI is a promoter, CpG, etc. I wanted to just
> input the raw methylation scores (rms) to see what I could get. So that is
> why I asked if the rms can be added up (in question 4 of my last
> email)...also I was curious if it could be done.
>
> Once again., thanks so very much for taking time from your busy schedule
> to answer my questions.
>
> Best regards,
> Allen
>
> ________________________________________
> From: Lukas Chavez [lukas.chavez.mailings at googlemail.com]
> Sent: Saturday, September 06, 2014 11:09 AM
> To: Allen [guest]
> Cc: bioconductor at r-project.org; Chong Kim San Allen; MEDIPS Maintainer
> Subject: Re: [BioC] MEDIPS: how does MEDIPS define a methylated region /
> cluster?
>
> Dear Allen,
>
> please see my comments below. Moreover, please consider reading section
> 7.7 of the MEDIPS manual.
>
> Best,
> Lukas
>
>
> On Fri, Sep 5, 2014 at 1:14 AM, Allen [guest] <guest at bioconductor.org
> <mailto:guest at bioconductor.org>> wrote:
> Hi,
> I have aligned my sequencing data for one sample (filename: Ca2_MAPQ20.bam
> --- because I filtered out everything with MAPQ20 or less) and I've put it
> through MEDIPS. At the moment, I am not making a comparison between
> samples. Therefore, the result file shows the methylation profile for only
> one sample. In this results file, I noticed that the data is organized as
> such:
> chr     start   stop    CF      Ca2_MAPQ20.bam.counts
>  Ca2_MAPQ20.bam.rpkm     Ca2_MAPQ20.bam.rms      Ca2_MAPQ20.bam.prob
>  MSets1.counts.mean      MSets1.rpkm.mean        MSets1.rms.mean
> MSets1.prob.mean
>
> 1) I tried reading the Down et al. (2008) paper that explains the concept
> of coupling factor and I think it is, simply put, a measure of local CpG
> density. I am not sure if my understanding of 'coupling factor' is correct?
>
> That is correct.
>
> 2) This lead to question how or what MEDIPS defines as a "region or
> cluster"? That is to say, on Chromosome 1, I have reads aligning from
> position "1002501" to "1003300" but then there is a region of 1400 bp (from
> "1003301" to "1004700") where there are no reads aligned (rpkm = 0). Then,
> again, from "1004701" to "1005400", there reads aligning to this region. So
> my question is does MEDIPS consider this to be a case of 2 methylated
> regions, that is, "1002501-1003300" and "1004701-1005400" or does MEDIPS
> try to consolidate these 2 methylated regions into one methylated
> region/cluster (that is, from "1002501-1005400") since they are fairly
> close to one another.
>
> Each window is considered as one region; CpG density, counts, rpkm, rms,
> and differential coverage will be calculated for each window separately.
> When windows with significant differential enrichment between groups have
> been identified, the MEDIPS.mergeFrames() function can be used to merge
> adjacent significant windows into extended regions.
>
> 3) I used "uniq=TRUE" to get rid of stacked/clonal reads and so for each
> 100 bp bin, I am usually getting rpkm=1 within each bin but occasionally,
> it may be as many rpkm=3. This lead me to wonder how the relative
> methylation score ("Ca2_MAPQ20.bam.rms") is calculated? For example, 3 bins
> have a value of 1 rpkm each, but one has an rms value of "1660.409928",
> another an rms of "7122.10254" and the third is only "679.1814523". How is
> that possible that 3 bins that have the same rpkm can have such varying rms
> values?
>
> This is due to the different CpG densities in these windows.
>
> 4)Can the rms be added up for a region so as to represent the cumulative
> methylation level for that region. I am asking because I do not know if the
> rms value is a log value or what? So, in my above question (3), if the 3
> bins are adjacent to one another and I decide to cluster them together and
> consider them as one methylated region, then would the rms value for this
> new 300bp bin that I have created be 1660.409928 + 7122.10254 + 679.1814523
> = 9461.68.
>
> Instead of merging any values across windows, I recommend to make use of
> the MEDIPS.createROIse() function which allows for analyzing specific
> regions of interest.
>
> 5)Finally, what does "Ca2_MAPQ20.bam.prob" represent? I figured that is
> some sort of a probability score but I am not sure of what.
>
> As mentioned in the manual, please disregard the prob values which will be
> removed in a future version.
>
> Thanks in advance for any assistance in answering my questions.
>
> Best regards,
> Allen
>
>  -- output of sessionInfo():
>
> error
>
> --
> Sent via the guest posting facility at bioconductor.org<
> http://bioconductor.org>.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Important: This email is confidential and may be privileged. If you are
> not the intended recipient, please delete it and notify us immediately; you
> should not copy or use it for any purpose, nor disclose its contents to any
> other person. Thank you.
>

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list