I would like to ask for your opinion on whether using replicated pools in the context of RNASeq experiments makes sense, or not.

Lets say that we are interested in detecting genes that are differentially expressed in two genetic backgrounds (a certain KO mutant strain and the corresponding WT), in mouse liver.

We could perform an RNASeq experiment using liver tissue from four KO and four WT with the same sex, age, and diet.

We would have eight samples: four biological replicates for each of the two conditions to be compared.

However, we decide to pool liver tissue from three animals, to prepare each of the eight samples (we would use, therefore 24 animals: 12 KO animals pooled to produce four KO samples, and 12 WT animals pooled to produce four WT samples).

We would do it following the argument that pooling samples to build biological replicates reduces variation between replicates and increases the statistical power of the analysis, resulting in a more sensitive detection of genes that are differentially expressed between conditions.

However, EdgeR relies, precisely, on measuring biological variability to establish the statistical significance of differences in gene expression across conditions. Therefore, pooling samples to buid biological replicates is not correct and we are, in fact, losing statistical power. We are unable of determining whether the observed differences in gene expression are significative or not.

There are some publications dealing with this issue in the context of microarrays (for example, Kendziorski et al, 2005, "On the utility of pooling biological samples in microarray experiments", PNAS, 102:4252) but I have not found anything similar in the context of RNASeq and, more specifically, of the analysis of RNASeq data with EdgeR.

Any comment will be more than welcome, as well as any relevant references.

Thanks a lot in advance.

