[BioC] PreFiltering probe in microarray analysis

Yuan Hao yuan.x.hao at gmail.com
Wed Jun 1 15:31:48 CEST 2011

Hi Stephanie,

You can have a look the 'genefilter' package in R/bioconductor.  
Basically, it's easy to set up a overall variance filter, for example  
if you have a data set normalized by gcrma and you require all  
probesets having an IQR bigger than 0.5, you can do:

 > library(affy)
 > library(genefilter)
 > library(gcrma)
 > eset <- gcrma(data)
 > f <- function(x)(IQR(x)>0.5)
 > selected <- genefilter (eset, f)
 > eset.filtered <- eset[selected, ]

You may have to be careful about the filtering on your data. It quiet  
depends on the characters of your data. There is a paper[1] having had  
a very good review about this, which doesn't really recommend an  
overall variance filtering combined with Limma.


[1] R. Bourgon, R. Gentleman and W. Huber. PNAS 2010. p9546-9551

On 1 Jun 2011, at 13:58, Stephanie PIERSON wrote:

> Hello everybody,
> I am a french student in bioinformatic. I have to analyze microarray  
> data and I have some questions about prefiltering genes.
> The dataset that I have to analyze consist in 8 microarray, i have 4  
> times points and 2 replicats for each time point. Agilent's two  
> color microarray  (Whole Mouse Genome (4x44K) Oligo Microarrays)  
> were used for the analysis. We are searching for genes that are  
> differentially expressed between two conditions (for example C1 and  
> C2) at the different time points and genes that are differentially  
> expressed in one condition (C1 or C2) over time .
> I have chosen LIMMA to perform the statistical analysis because I  
> read in papers (Jeanmougin et al. PLoS ONE, Jefferey and al. BMC  
> bioinformatic 2006,7/359  ) that it work better in experiment with  
> few replicate per conditions.
> I perfom the statistical analysis on the whole data set ( more than  
> 37 000 genes ), but I have high corrected p value after multiple  
> testing correction (benjamini hochberg ). I would like to prefilter  
> genes before statistical analysis, but I don't know how to do this.  
> I read in Bourgon's paper that we can filter on the overall variance  
> or on the overall mean, but in my case, with few replicates, how can  
> I do ? In more, in this paper, it is not recommended to combine  
> limma with a filtering procedure ...
> Someone can help me please ?
> Thank you,
> Best wishes
> Stéphanie
> -- 
> Stéphanie PIERSON
> Universite de la Mediterranee (Aix-Marseille II)
> Master 2 Pro Bioinformatique et Génomique
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list