[BioC] cpm on normalised counts (was "DESeq normalisation strategy")

Wed Jun 5 09:20:21 CEST 2013

Hi Simon, 

On May 29, 2013, at 11:46 AM, Simon Anders <anders at embl.de> wrote:

> The notion of "calculating cpm on normalized counts" is hence a 
> contradiction in terms.

Would you like to expand this sentence? I see it is not uncommon to evaluate counts in cpm after normalization. I'm thinking at edgeR and limma (that normalize by TMM)...
Moreover, I would like to exploit this thread for another point which still is not clear to my simple mind: normalizing counts (either by TMM or by geommean) makes the comparison at feature level possible, that's why we all trust DESeq (edgeR and limma::voom) and we agree RPKM is evil for that purpose :-) But. Once you have normalized counts, how would you rank features according to their abundance "within" the sample? How can you tell feature A is more represented than feature B in the same sample? Can you just use normalized counts for that?
I'm asking this because I'm facing some experimental data (not RNA-seq) where the features are huge genomic domains (megabases, spotted by chip-seq) that change between conditions (in terms of abundance, position and enrichment). I can describe the differences in terms of domain length (and genomic associations to genes, for example), but what about their "height"? I cannot use classical peak height as for normal ChIP-seq data, because that makes no sense at all, and I'm forced to use RPKM.
/me confused

thanks

d