[BioC] Filtering genes on highest expression before or after LIMMA?

Gordon K Smyth smyth at wehi.EDU.AU
Thu Mar 1 23:57:26 CET 2012


Dear Ekta,

Clearly filtering should be done before topTable(), because there is no 
point in conducting multiple testing adjustments over probes that are to 
be filtered.

Whether to filter before for after eBayes() is not so clear.  I would 
suggest filtering before, assuming that the probes after filtering are 
still representative of the whole genome.

It makes no difference whether filtering is done before or after lmFit().

Best wishes
Gordon

> Date: Thu, 1 Mar 2012 09:56:49 +0530
> From: Ekta Jain <Ekta_Jain at jubilantbiosys.com>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Cc: "sdavis2 at mail.nih.gov" <sdavis2 at mail.nih.gov>
> Subject: [BioC] Filtering genes on highest expression before or after
> 	LIMMA?
>
> Dear All,

> I have analyzed a dataset for differential gene expression using LIMMA. 
> The requirement was to select probesets with highest value of 
> expression. I notice that there is a change in results for when i filter 
> for probesets before and after performing LIMMA. The logFC and gene list 
> remains the same, only change is in the p-value and B Value, again this 
> is possible because the probesets are not averaged to the gene level but 
> retained on maximum expression there by not having any affect of the 
> filtering on the fold change.
>
> I can see that a similar question had been asked before 
> https://stat.ethz.ch/pipermail/bioconductor/2011-June/039936.html
>
> Would be grateful if someone can please tell me if its best to filter 
> before any statistical analysis or after? I am leaning towards 'filter 
> before always' but though will gather more views on the same.
>
> Many Thanks.
>
> Regards,
> Ekta

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list