[BioC] Select different linear models in voom

Thu Mar 6 09:45:55 CET 2014

I have recently implemented the approach used in voom to estimate the mean and the variance of each log-cpm at the observational level. My dataset contains ~1000 samples, that features a discrete amount of metadata that may be used as covariates (~400). This allows, in principle, for a better construction of the linear model on which both the fitted mean and the fitted variance are estimated in voom, by simply including more factors.

So far, I have used the AIC weights to test the probability for various linear models to be more likely to explain the data than the alternative models. Of course, testing all possible combinations of linear models is computationally infeasible (in principle, 2^400). However, even if I detected most gene are well explained by a simple LM, a non negligible fraction of them depend on additional factors. 

The point is the what makes the expression profile of a certain gene interesting, is when the covariates play an important role in determining its mean and variance. Therefore I am reluctant to use the simple LM because this would eliminate all the covariates. On the other hand, I am reluctant to use to more complicated LM because it clearly unnecessarily fits a large amount of genes.

What is the best way to proceed?

Thanks!

 -- output of sessionInfo(): 

R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] sv_SE.UTF-8/sv_SE.UTF-8/sv_SE.UTF-8/C/sv_SE.UTF-8/sv_SE.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.4.2  limma_3.18.9

loaded via a namespace (and not attached):
[1] tools_3.0.2

--
Sent via the guest posting facility at bioconductor.org.