[BioC] edgeR dataset filtering using pnas_expression.txt

Dave Tang davetingpongtang at gmail.com
Wed Jan 4 15:04:01 CET 2012


Hi list,

Just a question regarding edgeR and dataset processing/filtering prior to  
calling differential expression.

Case Study 12 (RNA-seq of Hormone-Treated LNCaP Cells) from the edgeR  
manual mentions that:

"We filter out lowly expressed tags and those which are only expressed in  
a small number of samples. We keep only those tags that have at least one  
count per million in at least three samples."

Then in section 6 of the manual it mentions that:

"The edgeR methodology needs to work with the original digital expression  
counts, so these should not be transformed in any way by users prior to  
analysis. edgeR automatically takes into account the total size (total  
read number) of each library in all calculations of fold-changes,  
concentration and statistical significance."

My question is whether filtering counts as "transforming" the data. Since  
this would affect the total size of each library and thus affecting all  
downstream calculations, is it OK to use such filters? And what should one  
be cautious about when applying such filters e.g. at least n tags in n  
samples, prior to performing the edgeR analysis?

Many thanks,


-- 
Dave



More information about the Bioconductor mailing list