[BioC] normalization and outlier detection

James MacDonald jmacdon at med.umich.edu
Wed Jun 2 17:26:45 CEST 2004

Hi Edmund,

1.) For outlier probes, rma should not be affected because it is using
a robust model fit. For outlier arrays (e.g., those that appear
completely different on a density plot), I dont' think you can do much
except re-run that array. I have never been able to get reasonable
results from 'obvious outlier' chips. I realize that the term 'obvious
outlier' is non-scientific in the extreme, but given the amount of data,
most tests designed to determine if the distributions are different will
reject the null hypothesis for all the chips in a given set, so they are
not very useful in this context. Another good way to detect outlier
chips is to use the residual plots in affyPLM. Any chip with large
residuals is not being fit by the model very well.

2.) We primarily use the Agilent Bioanalyzer 2100 to check the mRNA
quality prior to putting it on a chip. After the fact, you can use the
AffyRNAdeg() and plotAffyRNAdeg() functions to see if there is a problem
chip. Again, this is more of an 'eyeballometric' test, but I have found
it useful in the past.

3.) rma is a (set of) functions to convert probe-level data into
gene-level expression values, whereas limma is a package designed to fit
ANOVA models to microarray data. They are not used for the same thing,
so there is no reason to compare. In fact, you can (and I often do) use
rma to compute expression values, followed by limma to detect
differentially expressed genes.



James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109

>>> <echang4 at life.uiuc.edu> 06/01/04 09:07PM >>>
Hi everyone,

I am new to analysis of microarrays, so I'm sorry if the answers to my
questions should be obvious. But I would really appreciate any
inputs... I
should also mention that I am a biology grad student and not a

1) My question is regarding the normalization procedures for
U133 arrays. It seems like the best way to normalize arrays is by
the RMA method (better than dChip or Affymetrix's Tukey's biweight?) I
would like to use quantile normalization between arrays, so I have
using Bolstad's RMAExpress to analyze my .CEL files and then examining
residual images. If I encounter some horribly-looking arrays, is it
to leave those arrays out of the subsequent analysis? or is there some
of removing the outliers (like dChip) and then apply the RMA procedure
again? Or is that unnecessary?

2) Is there some sort of guideline to determine if the RNA was of low
quality (due to experimenter's error etc) or if the
labelling/hybridization was done incorrectly?

3) What is the difference between limma and RMA? Are there any
publications discussing the merit of one method over the other?

Thank you very much,
Edmund Chang
Graduate student- Physiology
University of Illinois, Urbana-Champaign
ecc0101 at yahoo.com 

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 

More information about the Bioconductor mailing list