[BioC] extracting CPM from a DGElist after normalization in edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Sat Apr 5 08:59:15 CEST 2014


Hi Alessandro,

I think you might not be understanding what scale normalization is.  Have 
a read of the section on normalization in the edgeR User's Guide.  That 
will also answer your question on pseudo-counts.

Best wishes
Gordon

On Fri, 4 Apr 2014, alessandro.guffanti at genomnia.com wrote:

> Hello - thanks also for this second clarification. I actually read this help 
> line, but it
> was a bit obscure to me
>
> Let me try to summarize:
>
>> colSums*(**cpm(currentDiff$counts)*) , where currentDiff is a DGEList 
> object after normalization
>
> LT1C  LT2C  LT3C  ST2C  ST4C ST10C ST12C  ST5C  ST7C  ST8C ST11C LT1P
> LT2P  LT3P  ST2P ST10P ST12P  ST7P  ST8P  ST4P
>
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
> 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06 1e+06
>
> ==> the *count **matrix (matrix of cpm)* in this case does not use the values 
> normalized by library size,
> so the values add up to 1 million, correct ? in this case, though, I can 
> compare directly values between
> samples.
>
>> colSums(*cpm(currentDiff)*)
>
> LT1C    LT2C    LT3C    ST2C    ST4C   ST10C   ST12C ST5C    ST7C
> ST8C   ST11C    LT1P    LT2P    LT3P    ST2P
>
> 1421292 1064057  981465  889765  960819  921314  985099  991736 1160034
> 1144623 1517511  864691 1220229  961164  937648
>
>  ST10P ST12P    ST7P    ST8P    ST4P
>
> 837525 999688  881438  922050  818447
>
> ==> these are the same count values, but normalized by library sizes, so the 
> CPM will not add up to 1.000.000 (roughly)
> per sample, correct ? this is also the way in which CPM are extracted in the 
> manual.
>
> But I don't understand one thing: we scale up (or don't scale up) the 
> libraries by size, then we calculate the CPM.
> Still the CPM should add up to 1 million for each sample in the two 
> categories, so that every gene can be compared
> directly between samples
>
> Am I missing something fundamental here or the scaling is done *after* the 
> CPM calculation ?
>
> Let me know, cheers,
>
> Alessandro
>
> PS
>
> A naive question: what is the role (roughly) for pesudocounts ?
>
> Many thanks for your feedback,
>
> Alessandro & Co.
>
> --
>
> Dear Alessandro,
>
> I see that Devon Ryan has answered your question, but the answer is also 
> available directly from the help system.  If you type help("cpm") the first 
> line of Details says:
>
> "CPM or RPKM values are useful descriptive measures for the expression level 
> of a gene or transcript. By default, the normalized library sizes are used in 
> the computation for DGEList objects but simple column sums for matrices."
>
> Best wishes
> Gordon
>
>
>
>
>
> -----------------------------------------------------------
> Il Contenuto del presente messaggio potrebbe contenere informazioni 
> confidenziali a favore dei
> soli destinatari del messaggio stesso. Qualora riceviate per errore questo 
> messaggio siete pregati di cancellarlo dalla memoria del computer e di 
> contattare i numeri sopra indicati. Ogni utilizzo o ritrasmissione dei 
> contenuti del messaggio da parte di soggetti diversi dai destinatari è da 
> considerarsi vietato ed abusivo.
>
> The information transmitted is intended only for the p...{{dropped:15}}


More information about the Bioconductor mailing list