[BioC] What is a good method for analyzing differential expression from Illumina BeadStudio summary data?

Fri Dec 15 17:40:04 CET 2006

Hi,

Simon Lin and Pan Du at Northwestern currently have a package for 
Illumina arrays called "lumi" in beta-test; I'm one of the testers. I'm 
sure that Simon would be happy to have another tester if you contact him....

In any event, the existing package offers several normalization options, 
including vsn, quantile, and cyclic loess in addition to a new method 
they have developed called "rsn".  They hadn't yet included a separate 
background correction step, but I'm using the method that RMA uses for 
Affymetrix arrays.  You can get this by simply doing something like

	exprs(object) <- apply(exprs(object), 2, bg.adjust, ...)

So far at least, I'm happy with this form of background estimation 
(although it might be better to figure out a way to exploit more 
explicitly the distribution of intensities in the negative controls).

On the only data set I have, I have experimented with both quantile and 
loess normalization.  Quantile normalization appears to be better based 
on the resulting Bland-Altman (M-vs-A) plots.  However, there may well 
be something peculiar about my data set.  Both normalization methods 
appear to give similar (but not identical, of course) lists of 
differentially expressed genes.

Finally, it is worth noting that the detection p-values appear to give 
useful information that can be exploited to make present/absent calls. 
On my data, a detection p-value > 0.90 (yes, I know that makes no sense, 
but that's Illumina's fault; what they call the detection p-value really 
seems to be the percentile of this gene's intensity relative to the 
negative controls) seems to correspond to "present".

Best,
	Kevin

Todd DeLuca wrote:
> Hi all,
> 
> I have spent the past few days searching the newsgroup archives,  
> reading vignettes, books, etc. in search of a method for ranking  
> genes according to differential expression between a disease state, a  
> normal state, and a control state, given Illumina BeadStudio summary  
> data.
> 
> There is a thread (called "illumina --> limma?") which notes that  
> Illumina's suggested method for background correction and  
> normalization gives negative values, which are incompatible with the  
> log2 transformation of expression data common before using lmFit.  An  
> email from Wolfgang Huber in this thread suggests using vsn, but how  
> can this work to background correct and normalize data in a way that  
> is suitable for lmFit?  Is it as simple as running vsn(exprs 
> (SummaryData)) and then multiplying the resulting matrix by log2(exp 
> (1))?  (The SummaryData object is one that would be created from  
> running the beadarray package function readBeadSummaryData(), and is  
> essentially an ExpressionSet).
> 
> Mark Dunning recommends using non-normalized data (http:// 
> article.gmane.org/gmane.science.biology.informatics.conductor/9721)  
> for analysis on the log2 scale, which he seems to suggest is  
> preferable to a linear scale, because of "a very obvious relationship  
> between the mean and variance."  However, isn't normalization of the  
> data essential to make accurate comparisons as required for  
> differential expression analysis?  Is it a bad idea to analyze the  
> data not on a log2 scale?
> 
> If anyone has done a differential expression analysis using R and  
> Illumina data, could they please respond with their method and any  
> comments on its pros and cons?
> 
> Many thanks,
> 
> Todd DeLuca
> Biological Software Engineer
> 
> Center for Biomedical Informatics
> Computational Biology Initiative
> Harvard Medical School
> 10 Shattuck Street
> Boston, MA 02115
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor