[BioC] DESeq2 - precisions about interaction and fold change calculation

Mon May 26 16:24:15 CEST 2014

Hi Samuel

To add to Mike's response:

>> - in the first example (p21, comparing "patient4.treatmentOHT" to
>> "patient4.treatmentControl"), what will be exactly the numerator and
>> denominators? How is the log2FoldChange calculated in this case?
>
> The numerator is the OHT interaction term in patient 4 and the
> denominator is the Control interaction term in patient 4. Subtracting
> the later from the former gives the OHT vs Control interaction effect
> for patient 4. If this is significant, it means the OHT vs Control
> effect for patient 4 is not explained by the main OHT vs Control
> effect.
[...]
> I also do not really understand the sentence " Note that the log2 fold
> change for treatment of OHT over control for patient 4 is the
> interaction effect above in addition to the main effect of treatment
> OHT over control", p21).

I add a complementary explanation to Mike's answer, which, however, is a 
bit lengthy -- but maybe helpful to understand how interactions really work.

In linear models, things often become clearer when you look at the model 
matrix (aka design matrix). This is the matrix that spells out how the 
fitted response for each sample is made up from the coefficients that 
you find in the result object, and it is at the actual heat of the 
fitting process. In DESeq2, our model matrices are a bit different from 
those usually used in linear modelling (and as returned by R's standard 
function 'model.matrix'), so it pays even for users with experience in 
linear modelling to have a look (or see the section on "Extended design 
matrices" in our preprint.)

In DESeq2, you can access the model matrix as follows (where ddsX is the 
object as found at the end of the vignette section on interactions):

 > mm <- attr( ddsX, "modelMatrix" )
 > mm
           Intercept patient1 patient2 patient3 patient4
SRR479052         1        1        0        0        0
SRR479053         1        1        0        0        0
SRR479054         1        1        0        0        0
SRR479055         1        1        0        0        0
[...]
SRR479077                     0                     0
SRR479078                     0                     0
           patient3.treatmentOHT patient4.treatmentOHT
SRR479052                     0                     0
SRR479053                     0                     0
[...]
SRR479075                     0                     0
SRR479076                     0                     1
SRR479077                     0                     1
SRR479078                     0                     1
attr(,"assign")
  [1] 0 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3

Each row in the model matrix corresponds to one sample. Here again is 
the meta data for ddsX:

 > as.data.frame( colData(ddsX)[,c("patient","treatment")] )
           patient treatment
SRR479052       1   Control
SRR479053       1   Control
SRR479054       1       DPN
SRR479055       1       DPN
SRR479056       1       OHT
SRR479057       1       OHT
SRR479058       2   Control
SRR479059       2   Control
SRR479060       2       DPN
SRR479061       2       DPN
SRR479062       2       DPN
SRR479063       2       OHT
SRR479064       2       OHT
SRR479065       2       OHT
SRR479066       3   Control
SRR479067       3   Control
SRR479068       3       DPN
SRR479069       3       DPN
SRR479070       3       OHT
SRR479071       3       OHT
SRR479072       4   Control
SRR479073       4       DPN
SRR479074       4       DPN
SRR479075       4       DPN
SRR479076       4       OHT
SRR479077       4       OHT
SRR479078       4       OHT

Remember, by the way, that we had thrown away the time point information 
in this example to pretend that, e.g., the last three samples are 
replicates.

Each column in the model matrix corresponds to one of the coefficients 
that is fitted in the model. Note how hence the output of 
'resultsNames(ddsX)' is just a listing of the column names of the model 
matrix.

Now look at the last line of the model matrix:

 > mm["SRR479078",]
                 Intercept                  patient1
                         1                         0
                  patient2                  patient3
                         0                         0
                  patient4          treatmentControl
                         1                         0
              treatmentDPN              treatmentOHT
                         0                         1
patient1.treatmentControl patient2.treatmentControl
                         0                         0
patient3.treatmentControl patient4.treatmentControl
                         0                         0
     patient1.treatmentDPN     patient2.treatmentDPN
                         0                         0
     patient3.treatmentDPN     patient4.treatmentDPN
                         0                         0
     patient1.treatmentOHT     patient2.treatmentOHT
                         0                         0
     patient3.treatmentOHT     patient4.treatmentOHT
                         0                         1

This shows which of the coefficients you would need to add up to see the 
model's prediction for a gene in this sample (which is from patient4, 
treated with OHT), namely the following:

- Intercept: the overall log expression of the gene under consideration, 
essentially averaged over all level combinations,

- patient4: the patient main effect for patient 4, i.e., the amount by 
which the expression in the samples from patient 4 differs from the average

- treatmentOHT: the treatment main effect for OHT, the amount by which 
expression under OHT treatment differs from the average

- patient4.treatmentOHT: the interaction term, i.e., the amount by which 
the OHT treatment effect differs in patient 4 from the average OHT 
treatment effect or -- and this is the same thing, phrased differently 
-- the amount by which the patient-4 effect differs in case of OHT 
treatment from the average over treatments

Note that the other three samples for patient 4 under OHT have exactly 
the same values in their model matrix row.

Hence, if we want to know the effect of OHT treatment on patient 4 as 
compared to control, we can subtract the model matrix row for the 
patient-4 control sample from that for one of the patient-4 OHT samples:

 > contr = mm["SRR479078",] - mm["SRR479072",]
 > contr
                 Intercept                  patient1
                         0                         0
                  patient2                  patient3
                         0                         0
                  patient4          treatmentControl
                         0                        -1
              treatmentDPN              treatmentOHT
                         0                         1
patient1.treatmentControl patient2.treatmentControl
                         0                         0
patient3.treatmentControl patient4.treatmentControl
                         0                        -1
     patient1.treatmentDPN     patient2.treatmentDPN
                         0                         0
     patient3.treatmentDPN     patient4.treatmentDPN
                         0                         0
     patient1.treatmentOHT     patient2.treatmentOHT
                         0                         0
     patient3.treatmentOHT     patient4.treatmentOHT
                         0                         1
Such a vector is called a "contrast vector". It shows how the desired 
contrast is formed by  adding up / subtracting the "log2FoldChange" 
columns of the respective result objects. If we skip over all the 
zeroes, we see that this one does the following:

treatmentOHT - treatmentControl +
    patient4.treatmentOHT - patient4.treatmentControl

In other words, it is the sum of the overall effect of OHT treatment as 
compared to control plus the specific extra effect that it has on 
patient 4.

To get this in DESeq2, you would write:

   results(ddsX, contrast = contr )

In the vignette, we discussed the contrast

   results(ddsX, contrast = list( "patient4.treatmentOHT",
      "patient4.treatmentControl" ) )

and as you can now see, this is only the extra effect specific to 
patient 4, i.e., the interaction effect. To get the full effect of OHT 
treatment on patient 4, we need to add the main effect of OHT treatment 
as well. That would be the one returned by

    results(ddsX, contrast = list( "treatmentOHT", "treatmentControl" ) )

or, with identical results

    results(ddsX, contrast = list( "treatment", "OHT", "Control" ) )

Now, if you sum up this main effect and the interaction effect from two 
paragraphs up, you get the same as what the call with the big contrast 
vector gave you.

Might be a bit of effort to work through this text, but if you manage, 
you will hopefully have a much better understanding what an interaction is.

   Simon