[BioC] genefiltering before or after the normalization?

Wolfgang Huber huber at ebi.ac.uk
Fri Jul 11 00:36:03 CEST 2008

Hi Abhilash

> I am working with single color data from Agilent platform. After the limma
> analysis the adjusted p values were higher than 5% of FDR. At this instance
> I am thinking of filtering the genes using genefilter. As my data set
> contains only raw intensities of normal and test before the normalization,
> where I am uisng 'normalizeBetweenArrays' command after log transforming the
> data.
> In this scenario I am quite confused whether I should use the filter
> functions prior to normalization of after the normalization but efore
> fitting the linear model?
> As my data is not an expressionSet I cannot use the nonfilter commands, in
> this case any suggestions of using other filtering methods?
> Appreciate the suggestions

Such filtering is performed after normalisation, but it is essential 
that the filter criterion does *not use the sample annotations*. E.g. 
you can use for each gene the overall variance or IQR across the experiment.

If x is a matrix with rows=genes and columns=samples, then this can be 
as simple as:

   rs = rowSds(x)
   fx = fx[ rs > quantile(rs, lambda), ]

where rowSds is in the genefilter package, and lambda is a parameter 
between 0 and 1 that contains your belief in what fraction of probes on 
the array correspond to target molecules that are never expressed in the 
conditions you study.

Also note that after such filtering, strictly speaking, the nominal 
p-values from the subsequent testing could be too small - but one can 
show that in typical microarray applications the bias is negligible 
(compared to the impact of other effects), and in any case the p-values 
can be used for ranking.

  Best wishes

Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber

More information about the Bioconductor mailing list