[BioC] Detecting significant trends in variability

Fri Feb 22 09:34:13 CET 2013

Hi everybody,

I have a question related to the analysis of methylation microarray 
data. At first, I asked it on Biostar (here 
http://www.biostars.org/p/64405/#64521) and somebody there suggested me 
to put it also here. The question is:

"[..] I am currently working on a DNA methylation microarray analysis 
project. I have 20 samples measured on a Illumina 450k. After some 
initial preprocessing and non-specific filtering, I lowered its 
dimensionality down to 47k probes. Using minfi, I adjust a linear 
regression model to each probe taking the sample age as the only 
continuous predictor and trying to estimate the methylation level (in 
the form of M-values, logit transformations of the beta values). 
P-values are then adjusted using FDR, and I keep the significant probes 
as the final subset of differentially methylated probes.

Now, we want to divide these probes in several groups, according to 
their variability trend. This is, we want to be able to detect if, for a 
given probe, the methylation values are convergent or divergent with 
respect to age. At first I was thinking about using the White test to 
see if the squared residuals behave as stated before, or something 
equivalent for heteroskedasticity testing. But then I thought that if 
the squared residuals behave in a non-normal way, it could be due to 
several other factors, such as outliers or influence points. Am I right 
untrusting this approximation or the White test could fit in this context?

A fellow told me another possible way would be to use Mixed Models with 
a variance function. That way I could model not only the change in 
methylation level but also the change in variabilty. If I choose this 
way, then I should define some age groups and partition the samples 
among them, shouldn't I? Is this a better approximation in this case 
than the basic linear regression? [..]"

ADDITIONAL NOTES:

I really like the mixed model approach, and I have managed to play a 
little bit with the nlme package and varFunc class family in order to 
study the heteroskedasticity, but I still think I am missing something. 
I have also being reading excerpts from the "Mixed Effects Models in S 
and S/Plus" book by Pinheiro and Bates, and I think I can understand the 
examples, but then I find it hard to adapt the examples to the 
methylation scenario.

For example, say I have the methylation values for one probe. Obvious 
simple linear model is "meth ~ age". So far, so good. But, if I want to 
convert it to a mixed model, which covariate can be declared as a random 
effect? I have been playing with the age as a random factor, but I am 
not sure if that is a good model. In the end, what I want is to be able 
to use lme() and pass it a varFunc in order to see if it can adjust a 
model for the variability trend.

If this cannot be modeled as a mixed model, is there any tool to fit a 
linear model with a variance function, just as the lme() function does?

Any help will be much appreciated.

Regards,
Gus