[BioC] edgeR - filtering criteria

Steve Lianoglou lianoglou.steve at gene.com
Mon Jul 22 18:33:14 CEST 2013


Hi Catarina,

Comments in line:

On Mon, Jul 22, 2013 at 8:07 AM, Catarina Almeida <catarina.fa at gmail.com> wrote:
> Hello everyone!
>
> I'm using edgeR to detect DE genes on my data. I have 2 control samples and
> 4 mutated samples.
> I understand why I should filter them and I know the command to use (the
> tutorials Drs. Mark Robinson, Davs McCarthy, Yunshun Chen and Gordon
> K.Smyth made are pretty self explanatory in everything). What I don't
> understand however is the filtering criteria.
>
> I named my DGE object as "d", so the command I'm using is:
> d <- d[rowSums(1e+06 * d$counts/expandAsMatrix(d$samples$lib.size, dim(d))
>> 1) >= ?, ]

Perhaps you'd like to simplify that to the more intuitive:

d <- d[rowSums(cpm(d) >= 1) >= ?, ]

> Meaning that I'm filtering out genes that don't have at least one count per
> million on "?" samples. What value should I use for "?" given that I have 2
> control and 4 mutated samples.

I believe the rule of thumb (if there is one) with this strategy would
be to use the number that is the minimum of the number of samples for
the conditions you have replicates in, so since you have one condition
with 2 replicates and another with 4, you'd pick 2.

HTH,
-steve

--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list