[BioC] Analyzing "differential variability" of methylation (and gene expression)

Pekka Kohonen pkpekka at gmail.com
Wed Mar 27 12:43:05 CET 2013


Hello,

I also find this an interesting question. I don't have a solution
handy but I would say that using linear models seems preferable
because you can discount in the model other sources of variation. For
instance batch effects, other confounding variables/covariates
(smoking?, bmi?) and so on that are not age-related.

Best Regards, Pekka

2013/3/26 Simone <enomis.bioc at gmail.com>:
> Hi!
>
> My question is more of a general style, nevertheless I hope someone can
> help.
>
> I am currently trying to analyze "differential variability" of gene
> expression and, above all, methylation data (Illumina microarray data: 27K
> and 450K BeadChip) in the context of aging, i.e. I would like to see if the
> variability of methylation increases (or decreases) for (healthy)
> individuals when they age. I would like to do this gene-wise, to see if and
> which genes show increased/decreased variability.
>
> Several studies already published in this context employ different methods
> for such kind of analyses:
>
> First of all there is the normal F-test. But since it requires data that
> does not depart from normality I think it is not applicable in my case. For
> one of my datasets (~ 500 samples after outlier removal) I performed
> Shapiro-Wilk tests for the ~ 27.000 CpGs and found that more than 26.200
> CpGs do not have normally distributed values (FDR 0.05). I think this is an
> usual observation when working with methylation data.
>
> In other analyses investigating similar questions Bartlett's test was
> employed. But it would require normal distributions as well. I also read
> something about this right here or in the R mailing list, where Ansari's
> test was proposed then for doing such kind of analyses. So maybe Ansari's
> test would be a good choice, although so far I have not seen any
> publication doing variability analyses by using Ansari's test.
>
> Another approach which was recommended to me was to not build age groups
> and compare them to each other (I used two "extreme" age groups, so very
> young vs. very old samples), but to create a kind of fixed-effect models
> for analyzing variability with age. Maybe something like this would be the
> best option as we have all the age information available (in years or even
> months) and this way we do not loose any information we actually have got.
> But I am not quite sure about how to model variability. How would one do
> this?
>
> Recently there was also a study published where they say that they used
> linear models and calculated "methylation deviance" as the squared distance
> of the residuals of every marker from the population mean, but again I am
> not sure about it, and the description of the methods part is quite short.
>
> Any suggestions about the "best" way to analyze changes in variability of
> methylation (and gene expression) values?
> Which strategy would you recommend?
>
> Best,
> Simone
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list