[BioC] limma design and contrast matrices for two-color (two-channel) data with common biological reference, not common mRNA reference

Wed Jun 29 21:27:57 CEST 2011

I have posted this question to BioStar.

http://biostar.stackexchange.com/questions/9662/

Perhaps few limma users monitor BioStar yet, so I'm cross-posting here
to the Bioconductor mailing list in hopes that a few lurk here. I will
re-post the contents of the question below:

The study [GSE13465][1] contains experiments of toxic effects of
acetaminophen on two organisms:

 - human
 - rat

I only care about data from the rat cultures. (The human data is
present due to submitter error.)

The rat cultures took place in two media types:

 - rat hepatocytes in normal media
 - rat hepatocytes in modified media

Each media condition was paired with an acetaminophen (APAP) treatment
condition for each biological replicate:

 - no exposure (0 mM APAP), the control
 - exposure to 5 mM APAP
 - exposure to 10 mM APAP

Expression levels were measured for each biological replicate-medium
combination using the two-channel Agilent 011868 Rat Oligo Microarray
G4130A ([GPL890][4]). All gene expression measurements were performed
with the labels as such:

 - Channel 1 (Cy5): 0 mM APAP (control) or 5 or 10 mM APAP (treatment)
 - Channel 2 (Cy3): 0 mM APAP (control)

This experiment design was repeated for three biological replicates
(rats 2, 3, and 4).

I would like to perform the following contrasts:

 - 5 mM APAP vs. control, standard media
 - 10 mM APAP vs. control, standard media
 - 5 mM APAP vs. control, modified media
 - 10 mM APAP vs. control, modified media

**My challenge is the following: there was no single mRNA sample
common across all the microarray experiments within a medium.**

>From what I understand, ideally the researchers should have pooled the
mRNA from each 0 mM APAP standard medium culture, then labeled this
pool with Cy3 and compared the pooled mRNA to each individual
culture's mRNA labeled with Cy5. (They would have repeated this design
for the modified medium, as well.) Unfortunately, this was not what
the researchers did.

Instead, within each biological replicate, the researchers used the
mRNA from the untreated sample from the same rat as a reference on the
Cy3 channel for that rat's cultures in that medium, and *only* that
rat's cultures. For example, in experiment GSM339438, Rat 2's cells
exposed to 5 mM APAP in standard medium were compared to Rat 2's cells
exposed to no APAP in standard medium.

A look at a subset of my `targets` may be helpful here:

    > targets[targets$Medium == "standard",]
                                          FileNameCy3             FileNameCy5
    Std_APAP0.vs.Std_APAP0.2  GSM339437_Cy3_36569.txt GSM339437_Cy5_36569.txt
    Std_APAP5.vs.Std_APAP0.2  GSM339438_Cy3_36570.txt GSM339438_Cy5_36570.txt
    Std_APAP10.vs.Std_APAP0.2 GSM339439_Cy3_36571.txt GSM339439_Cy5_36571.txt
    Std_APAP0.vs.Std_APAP0.3  GSM339443_Cy3_36630.txt GSM339443_Cy5_36630.txt
    Std_APAP5.vs.Std_APAP0.3  GSM339444_Cy3_36631.txt GSM339444_Cy5_36631.txt
    Std_APAP10.vs.Std_APAP0.3 GSM339445_Cy3_36632.txt GSM339445_Cy5_36632.txt
    Std_APAP0.vs.Std_APAP0.4  GSM339449_Cy3_37372.txt GSM339449_Cy5_37372.txt
    Std_APAP5.vs.Std_APAP0.4  GSM339450_Cy3_37373.txt GSM339450_Cy5_37373.txt
    Std_APAP10.vs.Std_APAP0.4 GSM339451_Cy3_37374.txt GSM339451_Cy5_37374.txt
                              GEOSample          Organism Donor   Medium Cy3APAP
    Std_APAP0.vs.Std_APAP0.2  GSM339437 Rattus norvegicus     2 standard       0
    Std_APAP5.vs.Std_APAP0.2  GSM339438 Rattus norvegicus     2 standard       0
    Std_APAP10.vs.Std_APAP0.2 GSM339439 Rattus norvegicus     2 standard       0
    Std_APAP0.vs.Std_APAP0.3  GSM339443 Rattus norvegicus     3 standard       0
    Std_APAP5.vs.Std_APAP0.3  GSM339444 Rattus norvegicus     3 standard       0
    Std_APAP10.vs.Std_APAP0.3 GSM339445 Rattus norvegicus     3 standard       0
    Std_APAP0.vs.Std_APAP0.4  GSM339449 Rattus norvegicus     4 standard       0
    Std_APAP5.vs.Std_APAP0.4  GSM339450 Rattus norvegicus     4 standard       0
    Std_APAP10.vs.Std_APAP0.4 GSM339451 Rattus norvegicus     4 standard       0
                              Cy5APAP       Cy3        Cy5
    Std_APAP0.vs.Std_APAP0.2        0 Std_APAP0  Std_APAP0
    Std_APAP5.vs.Std_APAP0.2        5 Std_APAP0  Std_APAP5
    Std_APAP10.vs.Std_APAP0.2      10 Std_APAP0 Std_APAP10
    Std_APAP0.vs.Std_APAP0.3        0 Std_APAP0  Std_APAP0
    Std_APAP5.vs.Std_APAP0.3        5 Std_APAP0  Std_APAP5
    Std_APAP10.vs.Std_APAP0.3      10 Std_APAP0 Std_APAP10
    Std_APAP0.vs.Std_APAP0.4        0 Std_APAP0  Std_APAP0
    Std_APAP5.vs.Std_APAP0.4        5 Std_APAP0  Std_APAP5
    Std_APAP10.vs.Std_APAP0.4      10 Std_APAP0 Std_APAP10

Can I still consider the 0 mM APAP conditions as one common reference,
even though each 0 mM APAP sample is actually only a common mRNA
reference within the same rat ("`Donor`")? In other words, does the
code below build a limma design model that is valid, or is the model
invalid?

    > std.targets <- targets[targets$Medium == "standard",]
    > std.design <- modelMatrix(std.targets, ref="Std_APAP0")
    > std.design
                              Std_APAP10 Std_APAP5
    Std_APAP0.vs.Std_APAP0.2           0         0
    Std_APAP5.vs.Std_APAP0.2           0         1
    Std_APAP10.vs.Std_APAP0.2          1         0
    Std_APAP0.vs.Std_APAP0.3           0         0
    Std_APAP5.vs.Std_APAP0.3           0         1
    Std_APAP10.vs.Std_APAP0.3          1         0
    Std_APAP0.vs.Std_APAP0.4           0         0
    Std_APAP5.vs.Std_APAP0.4           0         1
    Std_APAP10.vs.Std_APAP0.4          1         0
    > # Line below includes a dye effect in the model.
    > # NOTE: Apparently we can't check for a dye effect, even though the
    > # self-self hybridization of APAP0s should tell this AFAIK. Uncomment
    > # to see this code break.
    > #std.design <- cbind(DyeEffect=1, std.design)
    > std.fit <- lmFit(MA.std.genes.only, std.design)
    > std.fit <- eBayes(std.fit)
    > # I'm going to assume the Std_APAP5 and Std_APAP10 are my contrasts
    > # against Std_APAP0.

The control mRNA samples are from *biological* replicates, but not
identical across arrays. Is this sufficient? If not, what sort of
design and contrast matrices should I develop?

For example, should I consider this to be like a paired-samples type
study (Section 8.3 in the limma User's Guide, page 43). In this case,
I would create a design matrix using the factors of rat ("`Donor`")
and treatment ("`Cy5APAP`"), such as follows:

    > std.targets <- targets[targets$Medium == "standard",]
    > std.rats <- factor(std.targets$Donor)
    > std.rats
    [1] 2 2 2 3 3 3 4 4 4
    Levels: 2 3 4
    > std.treat <- factor(std.targets$Cy5APAP)
    > std.treat
    [1] 0  5  10 0  5  10 0  5  10
    Levels: 0 5 10
    > std.design <- model.matrix(~std.rats+std.treat)
    > rownames(std.design) <- rownames(std.targets)
    > std.design
                              (Intercept) std.rats3 std.rats4 std.treat5
    Std_APAP0.vs.Std_APAP0.2            1         0         0          0
    Std_APAP5.vs.Std_APAP0.2            1         0         0          1
    Std_APAP10.vs.Std_APAP0.2           1         0         0          0
    Std_APAP0.vs.Std_APAP0.3            1         1         0          0
    Std_APAP5.vs.Std_APAP0.3            1         1         0          1
    Std_APAP10.vs.Std_APAP0.3           1         1         0          0
    Std_APAP0.vs.Std_APAP0.4            1         0         1          0
    Std_APAP5.vs.Std_APAP0.4            1         0         1          1
    Std_APAP10.vs.Std_APAP0.4           1         0         1          0
                              std.treat10
    Std_APAP0.vs.Std_APAP0.2            0
    Std_APAP5.vs.Std_APAP0.2            0
    Std_APAP10.vs.Std_APAP0.2           1
    Std_APAP0.vs.Std_APAP0.3            0
    Std_APAP5.vs.Std_APAP0.3            0
    Std_APAP10.vs.Std_APAP0.3           1
    Std_APAP0.vs.Std_APAP0.4            0
    Std_APAP5.vs.Std_APAP0.4            0
    Std_APAP10.vs.Std_APAP0.4           1
    attr(,"assign")
    [1] 0 1 1 2 2
    attr(,"contrasts")
    attr(,"contrasts")$std.rats
    [1] "contr.treatment"

    attr(,"contrasts")$std.treat
    [1] "contr.treatment"

    > std.fit <- lmFit(MA.std.genes.only, std.design)
    > std.fit <- eBayes(std.fit)
    > # Is this correct that I don't need a contrasts matrix; that
    > # std.treat5 is representing the contrast
    > # "Std_APAP5 vs. Std_APAP0?"

I take it in this design, the particular rat is being considered part
of the model, with Rat 2 as the baseline, and then the treatment with
APAP is considered an additional factor on top of the particular rat.
It's not obvious to me that this is a correct approach either, though.

Any thoughts and suggestions on the appropriate design and contrast
matrices would be extremely helpful.

  [1]: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13465