[BioC] A question on DESeq

Wed Jul 28 22:30:11 CEST 2010

Hi Sunghee

On Wed, 28 Jul 2010 14:11:19 -0400, sunghee OH <sshshoh1105 at gmail.com>
wrote:
> It looks like there is no way to handle genes with zero read counts in
> DESeq
> as the returned values are all NA for below two cases
>                1. totally no information for two groups as like 0 0
>                2. uniquely expressed genes as like 0 c(constant) or c 0
> 
> In DESeq, when M-D plot is generated or de analysis is performed, it
looks
> like those genes in M-D plot are all discarded and DESeq returns just NA
> values for such cases. is that correct?

No. In your case 2 (some but not all samples have zero counts), DESeq can
and does calculate a p value for differential expression. Only the log fold
change estimate is, necessarily, infinity, because you are dividing by
zero.

Only in case 1 (zero counts in _all_ samples that are involved in the
comparison), the p values is NA. This makes sense because if you do not
observe anything from a gene you cannot say anything about it.

> if yes, for genes with uniquely expressed genes, it could be
informative.
> isn't it?  to my knowledge, DEGseq and edgeR they are doing a simple way
> for
> such cases. so, there is no NA value in the output even there are genes
> with
> zero read counts as the input.

To my knowledge, edgeR treats zero counts in the same way as DESeq. (It
used to skip rows with all zero counts but now leaves them in and puts NA.)

> Could you please explain how to handle genes with zero read counts in
DESeq
> package?

If you really see NA even if only some counts are zero you have found a
bug. Please send details in this case. (However, you are not the first one
to ask, and, so far, people had just not looked properly and confusesd the
p value column with the log fold change column in the results data frame.)

Cheers
  Simon