[BioC] logFC and CPM of edgeR

Fri Apr 18 20:07:45 CEST 2014

Hi,

I should let Gordon answer this since I'll likely say something that's
not precise enough to be exactly right, but here goes ... the long and
short of it is that you are kind of on the right track ...

> 1."logFC" using glm functions are calculated from normalized factor.

I don't know exactly what you mean here ... the normalized factor is
used (directly or indirectly) as an offset to the glm.

The expression values used in the glm are the integer counts themselves (no cpm)

> This process does not include transformation to cpm.

Right.

> 2. CPM should be calculated from count with its normalized factor, Main purpose of cpm to show heatmap (is described in User's guide).

Sure. You might prefer to use the output from predFC() for such uses,
though (see ?predFC)

> 'logFC' from cpm are different from 'logFC' using glm functions. I'm sure that this result is general according to my understanding. Is it correct ?

This is easy to test yourself ... create a scatter plot of the logFC
from the glm vs the logFC you calculate manually and take a look at
who is deviating from the 45-degree line.

One important difference, however, is that edgeR produces shrunken log
fold changes. See this very recent post (and the others in the same
thread) from Gordon recently:

https://stat.ethz.ch/pipermail/bioconductor/2014-April/059048.html

> And which count data (e.x. cpm, pseudo count and fpkm) is suitable for pharmacologist ?

The answer to this question entirely depends on what the data intends
to be used for, and little to do with the fact that you are talking
with a pharmacologist.

Lastly, from you sessionInfo, you are using a very old version of R
(and, therefore, edgeR). Please update. R-3.1.0 was recently released,
so now would be a great time to go through the minor nuisance of doing
the upgrade dance (it's really not that difficult at all).

HTH,
-steve

-- 
Steve Lianoglou
Computational Biologist
Genentech