[BioC] (no subject)

Gordon K Smyth smyth at wehi.EDU.AU
Wed Jan 14 00:04:34 CET 2009


Dear Dan,

It's very common practice to keep all the probes for normalization, then 
to filter control probes and consistently non-expressed probes before 
differential expression analysis.  I recommend and do it this myself. 
It's such common practice that it's surprising to see a paper on it at 
this stage.

It is in the spirit of normalization methods that all probes should be 
retained for normalization, except in unusual cases in which some probes 
are obviously poor quality for reasons other than expression level.

At the differential expression step, probes can be usefully filtered out 
if they are not of any potential interest.  This means control probes, or 
probes which appear to be non-expressed across all conditions in the 
experiment, i.e., on all arrays. I have frequently complained on this 
mailing list about the practice of filtering individual low intensity 
probes on individual arrays, which IMO is a very destructive practice. 
If you filter a probe on the basis of expression, it must be filtered on 
all arrays.

Filtering non-expressed probes tends not be emphasised on this list 
because users of this list are often sophisticated enough to use variance 
stabilizing normalization methods such as rma, vsn, normexp or vst.  This 
means that low-expression filtering is done more for multiplicity issues 
than for variance stabilization, and therefore often doesn't make a huge 
difference.  When using earlier normalization methods such as MAS for Affy 
or local background correction for two-color arrays, expression-filtering 
is absolutely essential, because the normalized expression values are so 
unstable at low intensity levels.

To James, it is not necessary to give retain all the probes on the array 
for eBayes().  The only requirement is that eBayes() sees all the probes 
which are under consideration for differential expression.  So filtering 
out consistently non-expressed probes before linear modelling is generally 
a good idea.  In fact, filtering often improves the eBayes() assumptions. 
eBayes assumes that the residual variances are not intensity-dependent. 
However very lowly expressed probes often follow a mean-variance 
relationship which is somewhat different from the other probes, even after 
variance stabilization, in which case filtering will improve the constancy 
of variance assumption.  This tends not to be a big issue with rma-Affy 
data, but it is an important issue with vst-Illumina data for example.

Best wishes
Gordon

> Date: Mon, 12 Jan 2009 09:25:02 -0500
> From: "James W. MacDonald" <jmacdon at med.umich.edu>
> Subject: Re: [BioC] Filtering before differential expression analysis
> 	of microarrays - New paper out
> To: Daniel Brewer <daniel.brewer at icr.ac.uk>
> Cc: bioconductor at stat.math.ethz.ch
>
> Hi Dan,
>
> Daniel Brewer wrote:
>> Hi,
>>
>> There is a new paper out at BMC bioinformatics that seems to justify the
>> use of filtering before differential expression analysis is performed
>> (Hackstadt & Hess BMC Bioinformatics 2009, 10:11 -
>> http://www.biomedcentral.com/1471-2105/10/11/abstract).  Specifically
>> filtering by variance and detection call.  I have got the impression
>> from this list that the general opinion is that one should only filter
>> out the control genes before testing.  I was wondering if anyone had any
>> opinions on this paper and the topic in general.
>
> I'm sure people do have opinions about this topic ;-D
>
> The reason people have so many opinions is because it isn't a simple
> question, and it depends on what you consider important.
>
> If you are just trying to limit the number of multiple comparisons to
> increase power, then filtering first is probably the way to go.
>
> If you are concerned with the accuracy of the FDR estimates, then
> filtering first may not be ideal.
>
> If you are using limma (Hackstadt and Hess used multtest), then you
> should filter after the eBayes step but before the FDR step, as an
> assumption of the eBayes step is that all of the data from the chip are
> available.
>
> Unless of course you are concerned about the accuracy of the FDR
> estimates, in which case... well you see the point.
>
> With microarray data analysis the arguments for and against a particular
> way of doing things can shed more heat than light, as nobody really
> knows the underlying truth, and the measures we use are really far
> removed from the actual phenomenon we are testing.
>
> Best,
>
> Jim
>
>
>>
>> Many thanks
>>
>> Dan
>>
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-5646
> 734-936-8662



More information about the Bioconductor mailing list