[BioC] data normalization
Thomas.H.Hampton at Dartmouth.edu
Tue Jul 21 16:37:36 CEST 2009
No one pays me for my opinions on this subject, so you may have mine
First, normalization is a slightly nasty business when it comes to
microarrays. The basic
idea, of course, is to use some mechanism to remove obvious systematic
For example, in a two color system, the two dyes may have slightly
different intensity profiles
when measuring the same sample. My first piece of advice is that you
use some mechanism
to SEE what that effect looks like in your system. I think the limma
package (a big favorite in these
parts) has a function called plotDensities(). One can make these in R
using the density() function. You can also create a plot of
the log fold change vs average log intensity for this type of array,
you will generally observe a
pattern looks like a banana. In other words, the residuals of the
regression line through this
plot show obvious local trends. If you ignore this, then you are
accepting that your experimental
conditions have somehow conjured this up, and this is not at all likely.
The most intuitively obvious solution is to straighten the banana out,
and you can achieve this by loess,
also available in limma. Loess creates a local regression curve
through the middle of the banana,
then applies predictions based on this line to adjust one channel or
the other, straightening it.
Interestingly, you can get a rather similar result by quantile
normalization, which forces two data sets
to share a common distribution. It took me a minute to envision why
this is true, but it is.
Another possibility, one that I have not tried, is based on variance
stabilization. This makes are rather
different set of assumptions, and I am also going to play with this in
the near future.
Whatever approach you choose, you can be assured that your
normalization approach will be
creating new artifacts in your data. There is no perfect world here.
This fact alone makes
people edgy sometimes. Second, there are many other systematic effects
that are much
more complicated than intensity dependent dye effects. The good news
is that if you understand
the magnitude of your unwanted systematic effects pretty well, you can
hopefully do enough normalization
of the right sort to partly compensate for it without introducing
In summary, this is not a turnkey system where you just drop all the
numbers into a magical grinder
and out pops the correct answer without any though or understanding on
anyone's part. It takes time
and consideration to do these things, a fact that most (but not all)
of the people who pay the rent
around here understand.
On Jul 21, 2009, at 8:56 AM, James W. MacDonald wrote:
> Hi Barbara,
> Barbara Uszczynska wrote:
>> Dear R-Users,
>> I use home-made spotted arrays to do some research contected with
>> The matrix consist of : 60% of the genes are up-regulated and 40%
>> of genes
>> that are down-regulated and spikes. I didn't use any genes with
>> expression. How I should analyse this experiment? According to
>> statistics I
>> should focuse on external spike controls and compare all genes
>> with spikes.
>> It is two coulour experiment. So I have to build quite complicated
>> statistical model. I'm not sure if it is a right pathway. What do
> I think two things:
> First, asking the same question over and over will not endear you to
> the listserv community, and will increase the likelihood that your
> posts will simply be deleted by those who might help you.
> Second, what you are asking for is statistical help in analyzing
> your experiment rather than help using software. Since many of the
> people on this list are practicing statisticians, what you are
> asking is for them to do what they get paid to do for you for free.
> I would suggest that a more reasonable approach is to find a local
> statistician to help you with your analysis, as you are unlikely to
> get any (reasonable) help on a listserv.
>> [[alternative HTML version deleted]]
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> James W. MacDonald, M.S.
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor