[BioC] total count filter cutoff (edgeR)

Gordon K Smyth smyth at wehi.EDU.AU
Sat May 3 03:04:32 CEST 2014


Mahnaz,

Just one more comment: with the large sequence depth that you have, you 
can afford to go down to a low cpm cutoff in order to include very lowly 
expressed genes and transcripts in your analysis.  You could try cpm>0.2 
or cpm>0.1.

Best
Gordon


On Fri, 2 May 2014, Gordon K Smyth wrote:

> Hi Mahnaz,
>
> Why don't you follow the advice of the edgeR User's Guide (as Mark has 
> suggested)?  All the case studies in the User's Guide describe how the 
> filtering was done in a principled way.
>
> Total count filtering is not so bad, but it is susceptible to being driven by 
> one library, especially by one library with a large sequence depth. The 
> procedure described by Mark and used in the guide is a compromise of several 
> considerations.
>
> BTW, there are newer versions of R and edgeR available than what you are 
> using.
>
> Best wishes
> Gordon
>
>
>> Date: Wed, 30 Apr 2014 21:34:50 +0200
>> From: Mark Robinson <mark.robinson at imls.uzh.ch>
>> To: "Ryan C. Thompson" <rct at thompsonclan.org>
>> Cc: bioconductor at r-project.org, Mahnaz Kiani <mahnazkiani at gmail.com>
>> Subject: Re: [BioC] total count filter cutoff
>> 
>> 
>> In my lab, we typically follow a "CPM of at least X in at least Y samples" 
>> rule, where X=1 (arbitrary but reasonable, can be changed) and Y=size of 
>> smallest replicate group, according to one of the case studies in the 
>> user's guide, for example:
>> 
>> ------
>> 4.3.6 Filtering
>
>> We filter out very lowly expressed tags, keeping genes that are expressed 
>> at a reasonable level in at least one treatment condition. Since the 
>> smallest group size is three, we keep genes that achieve at least one count 
>> per million (cpm) in at least three samples:
>> 
>>> keep <- rowSums(cpm(y)>1) >= 3
>>> y <- y[keep,]
>> ------
>> 
>> (http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf)
>> 
>> Cheers, Mark
>> 
>> 
>> ----------
>> Prof. Dr. Mark Robinson
>> Statistical Bioinformatics, Institute of Molecular Life Sciences
>> University of Zurich
>> http://ow.ly/riRea
>
>
>> Date: Wed, 30 Apr 2014 11:29:28 -0700 (PDT)
>> From: "mahnaz Kiani [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, mahnazkiani at gmail.com
>> Subject: [BioC] total count filter cutoff
>> 
>> 
>> I'm using edgeR for analysis of may data and I'm not sure what total count 
>> filter value cutoff value I should use, My reads are paired 50bP reads and 
>> total reads per sample is about 80,000,000. I tried cutoff values of 
>> 5,10,15,30,50 and 100 and I only saw differences between 50 and 100 but 
>> still looking for logical reason to chose the cutoff value.
>> 
>> Appreciate your help,
>> Mahnaz
>> 
>> -- output of sessionInfo():
>> 
>> R 3.0.2
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list