[BioC] edgeR prior.count

Mon Dec 2 19:55:38 CET 2013

I recently used the EdgeR package to analyze a RNA-Seq dataset, with 2 genotypes and 3 biological replicates each.

After running the exacttest, the logFC and logCPM are provided for each gene. I am a bit confused about how exactly these values are calculated.

1) For logCPM, I assume that this is the average expression over all samples. It is clearly not simply the averaged [counts/effective library size for each sample].

 I understand that generally speaking the original counts (or the CPM? instead) are moderated to avoid infinite values when taking logs of samples/genes with zero counts/CPM, but I'm not quite sure that I can figure out exactly how this is produced. 

a) Is the same small value added to each gene for each sample or is the added value different for different genes? How is prior.count determined?
b) Are only genes that have a "0" in one sample moderated or all all genes moderated by prior.count?
c) Is there a way to see the moderated CPM for each gene and sample and not just the log (moderated CPM)?

2) How is the logFC calculated? Is it based on moderated CPMs for each lane? Does it take the ratio of the average moderated CPM for each group?

Thank you!

 -- output of sessionInfo(): 

R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.2.4  limma_3.16.7

--
Sent via the guest posting facility at bioconductor.org.