[BioC] What is a good method for analyzing differential expression from Illumina BeadStudio summary data?

Gordon Smyth smyth at wehi.EDU.AU
Wed Feb 28 08:05:01 CET 2007


Dear Todd,

People have been giving advice about normalisation methods for 
Illumina. You may be over interpretting some of these comments in 
terms of "compatibility" when it is really just a matter that some 
methods are better than others.

limma can do the differential expression analysis for you regardless 
of what normalisation method you use. limma does expect to get 
expression values on a log-scale. By convention, this is usually the 
log-2 scale, but it doesn't have to be. You can easily put data on 
the log2-scale regardless of how you normalise the data. If your data 
contains missing values,  limma will simply take them into account in 
the usual way, basing the analysis on the observations which are not missing.

On the other hand, some normalisation methods are much better than 
others. The Illumina platform is still very new, so it's too early to 
say which method is best yet. There are several methods which seem to 
give pretty good results.

>[BioC] What is a good method for analyzing differential expression 
>from Illumina BeadStudio summary data?
>Todd DeLuca todd_deluca at hms.harvard.edu
>Fri Dec 15 17:14:17 CET 2006
>
>Hi all,
>
>I have spent the past few days searching the newsgroup archives,
>reading vignettes, books, etc. in search of a method for ranking
>genes according to differential expression between a disease state, a
>normal state, and a control state, given Illumina BeadStudio summary
>data.
>
>There is a thread (called "illumina --> limma?") which notes that
>Illumina's suggested method for background correction and
>normalization gives negative values, which are incompatible with the
>log2 transformation of expression data common before using lmFit.

As I say above, this isn't an issue of compatability with limma. Many 
of us don't like Illumina's background correction/normalization 
method just because it doesn't seem a very good normalisation method. 
Many of us also don't like to introduce missing values.

>   An
>email from Wolfgang Huber in this thread suggests using vsn, but how
>can this work to background correct and normalize data in a way that
>is suitable for lmFit?  Is it as simple as running vsn(exprs
>(SummaryData)) and then multiplying the resulting matrix by log2(exp
>(1))?  (The SummaryData object is one that would be created from
>running the beadarray package function readBeadSummaryData(), and is
>essentially an ExpressionSet).

Yes, it's as simple as that. There are other good methods for 
Illumina data also, e.g., quantile normalisation (as used in the rma 
algorithm for affy data) or lumiN in the lumi package.

>Mark Dunning recommends using non-normalized data (http://
>article.gmane.org/gmane.science.biology.informatics.conductor/9721)
>for analysis on the log2 scale, which he seems to suggest is
>preferable to a linear scale, because of "a very obvious relationship
>between the mean and variance."  However, isn't normalization of the
>data essential to make accurate comparisons as required for
>differential expression analysis?

Yes, normalization is important. Mark wasn't suggesting that you 
don't normalize. He was rather suggesting that you take the 
non-normalized data from Illumina and apply your own normalization 
method, for example you can get good results simply by quantile 
normalizing the Illumina summary expression values. This gives you a 
matrix which you can give straight to limma.

>   Is it a bad idea to analyze the
>data not on a log2 scale?

It is a very bad idea to analyze the data on the raw (exp) scale. 
However log2 or loge doesn't make any difference, except convenience 
of interpretation of the resulting log-fold changes.

>If anyone has done a differential expression analysis using R and
>Illumina data, could they please respond with their method and any
>comments on its pros and cons?

The beadarray vignette (for summary data) gives a complete worked 
example. The results from lumi or vsn can also go straight into limma 
without any modification.

Hope this helps
Gordon

>Many thanks,
>
>Todd DeLuca
>Biological Software Engineer
>
>Center for Biomedical Informatics
>Computational Biology Initiative
>Harvard Medical School
>10 Shattuck Street
>Boston, MA 02115



More information about the Bioconductor mailing list