[BioC] Multifactor model design for DE analysis (DESeq2 & edgeR)

Tue Aug 19 10:26:32 CEST 2014

Hi Mathieu

On 19/08/14 10:01, Mathieu Bahin wrote:

> I have been designing models with 2 factors: condition (control /
> tumor) and patient ID (to match the paired samples). I used the model
> '~sample_id + condition’ until now but I would like to add a third
> factor, the breed.

> Is that then correct to use ‘~sample_id + breed + condition’ if my
> goal is to analyse the DE between control and tumor samples taking
> into account the individual variabilities (with the sample ID factor)
> and the breed variability (with the breed factor).

No. This would make breed another blocking factor, besides patient_id. 
But it does not offer any new information, because all samples from the 
same patient are from the same breed, so the patient_id factor already 
captures all variation associated with this.

Therefore, there is no need to account for breed if you just want to see 
the overall effect of cancer.

If, however, you want to know for which genes the expression change due 
to cancer _depends_ on breed, you are looking for an _interaction_ 
between breed and condition and should hence use:

   ~ patient_id + condition + breed:condition

(BTW, I renamed your factor from "sample_id" to "patient_id": After all, 
you have two samples from each patient.)

> Another question: If I use the pairwise information, I don’t have
> replicates because I  only have two sample (one control, one tumor)
 > for each patient. Is it better to use it (and then have no replicates)
 > or not (and then have replicates for ‘control’ and ‘tumor’ samples) ?

Of course, you still have replicates. You have several dogs. This is the 
whole point of the paired design. If you omitted the "patient_id" 
factor, you would drastically lose inferential power.

   Simon