[BioC] zero rna-seq values AFTER normalisation in edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Sun Aug 17 02:24:54 CEST 2014


Dear Nick N,

Thanks for using edgeR.  You do have misunderstandings however about how 
normalization works and what is output by the cpm() function.

> Date: Fri, 15 Aug 2014 14:23:09 +0100
> From: Nick N <feralmedic at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] zero rna-seq values AFTER normalisation in edgeR
>
> I am using edgeR to analyze RNA-Seq data. This is my script:
>
>
> library("edgeR")

[snip]

> d <- calcNormFactors(d)
> all_cpm=cpm(d, normalized.lib.size=TRUE)

[snip]

> I believe that the variable "all_counts" shall contain the normalized
> counts for each sample in each condition.

The cpm() function simply computes counts-per-million, which is a 
ratio rather than a count.

> My understanding is also that edgeR adds pseudocounts BEFORE performing 
> the library normalisation.

No it doesn't.  Why would you think that?  edgeR works with your data as 
it actually is rather than trying to fudge it.

> Thus it is possible that some values revert to being zero after 
> normalisation. But I thought that this would happen rarely. Yet in a 
> recent dataset I find an improbably large number of zero values in 
> "all_counts" which made me think that my understanding of how 
> pseudocounts and normalisation work in edgeR might be incorrect. Can, 
> please, somebody comment on this?

cpm() simply computes counts per million by dividing the counts by the 
normalized library sizes.  Obviously a zero count corresponds to a zero 
count-per-million.  That seems pretty natural!

Are you perhaps thinking of the use of prior.counts when computing cpm or 
logFC on the log-scale?  The help page for the cpm() function tells you 
that prior counts are not used when computing plain cpm values on the raw 
scale.

I wonder what source you are relying on for information about edgeR?  The 
most reliable source is the documentation that comes with edgeR.

Best wishes
Gordon

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list