[BioC] What is a good method for analyzing differential expression from Illumina BeadStudio summary data?
Kevin R. Coombes
krc at mdacc.tmc.edu
Fri Dec 15 17:40:04 CET 2006
Simon Lin and Pan Du at Northwestern currently have a package for
Illumina arrays called "lumi" in beta-test; I'm one of the testers. I'm
sure that Simon would be happy to have another tester if you contact him....
In any event, the existing package offers several normalization options,
including vsn, quantile, and cyclic loess in addition to a new method
they have developed called "rsn". They hadn't yet included a separate
background correction step, but I'm using the method that RMA uses for
Affymetrix arrays. You can get this by simply doing something like
exprs(object) <- apply(exprs(object), 2, bg.adjust, ...)
So far at least, I'm happy with this form of background estimation
(although it might be better to figure out a way to exploit more
explicitly the distribution of intensities in the negative controls).
On the only data set I have, I have experimented with both quantile and
loess normalization. Quantile normalization appears to be better based
on the resulting Bland-Altman (M-vs-A) plots. However, there may well
be something peculiar about my data set. Both normalization methods
appear to give similar (but not identical, of course) lists of
differentially expressed genes.
Finally, it is worth noting that the detection p-values appear to give
useful information that can be exploited to make present/absent calls.
On my data, a detection p-value > 0.90 (yes, I know that makes no sense,
but that's Illumina's fault; what they call the detection p-value really
seems to be the percentile of this gene's intensity relative to the
negative controls) seems to correspond to "present".
Todd DeLuca wrote:
> Hi all,
> I have spent the past few days searching the newsgroup archives,
> reading vignettes, books, etc. in search of a method for ranking
> genes according to differential expression between a disease state, a
> normal state, and a control state, given Illumina BeadStudio summary
> There is a thread (called "illumina --> limma?") which notes that
> Illumina's suggested method for background correction and
> normalization gives negative values, which are incompatible with the
> log2 transformation of expression data common before using lmFit. An
> email from Wolfgang Huber in this thread suggests using vsn, but how
> can this work to background correct and normalize data in a way that
> is suitable for lmFit? Is it as simple as running vsn(exprs
> (SummaryData)) and then multiplying the resulting matrix by log2(exp
> (1))? (The SummaryData object is one that would be created from
> running the beadarray package function readBeadSummaryData(), and is
> essentially an ExpressionSet).
> Mark Dunning recommends using non-normalized data (http://
> for analysis on the log2 scale, which he seems to suggest is
> preferable to a linear scale, because of "a very obvious relationship
> between the mean and variance." However, isn't normalization of the
> data essential to make accurate comparisons as required for
> differential expression analysis? Is it a bad idea to analyze the
> data not on a log2 scale?
> If anyone has done a differential expression analysis using R and
> Illumina data, could they please respond with their method and any
> comments on its pros and cons?
> Many thanks,
> Todd DeLuca
> Biological Software Engineer
> Center for Biomedical Informatics
> Computational Biology Initiative
> Harvard Medical School
> 10 Shattuck Street
> Boston, MA 02115
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor