[BioC] Design/Contrast Matrix for Two Channel Microarray

Thu Feb 6 03:13:45 CET 2014

On Thu, 6 Feb 2014, Joseph Shaw wrote:

> Hi Gordon,
>
> Thanks for your response - I believe it has cleared everything up.
>
> So, for example, for experimental design (d), the simple saturated
> direct design, we have the design matrix (1 0; 0 1; -1, -1).
>
> The first coefficient represents B-A, hence the first row (1 0); the
> second coefficient represents C-B, hence the second row (0 1) and
> because the third row represents the third array (A-C), we have:
>
> (-1 -1) = -(B-A)-(C-B) = -B+A-C+B = A-C
>
> which is what we wanted. Is this correct?

Yes.

> I have one last question. In practice, is this approach identical to a
> 3x3 diagonal matrix (of ones) where each column represents and array
> contrast?
>
> More specifically:
>
> 1 0 0 ---> B-A
> 0 1 0 ---> C-B
> 0 0 1 ---> A-C

No, it is not equivalent.  The three pairwise comparisons are 
inter-related, and this must be represented by the design matrix.

Gordon

> Joseph
>
> On Wed, Feb 5, 2014 at 11:28 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>> Dear Joseph,
>>
>>> Date: Tue,  4 Feb 2014 18:11:50 -0800 (PST)
>>> From: "Joseph Shaw [guest]" <guest at bioconductor.org>
>>> To: bioconductor at r-project.org, josph.sh at gmail.com
>>> Subject: [BioC] Design/Contrast Matrix for Two Channel Microarray
>>>
>>>
>>> Hi all,
>>>
>>> Could somebody explain the process used in developing the design matrix
>>> for two channel microarray experiments in Limma; in particular, those given
>>> for each experiment in Figure 1 in the empirical Bayes paper
>>> (http://www.statsci.org/smyth/pubs/ebayes.pdf).
>>>
>>> For single channel arrays, the design matrix seems to assume the form of
>>> standard linear model design matrices; that is, 1 where an array treatment
>>> is present and 0 otherwise. From here, the resulting model parameters can be
>>> tested with the implementation of an appropriate contrast matrix (where,
>>> typically, each contrast effect sums to zero). This does not appear to be
>>> the case for two-channel experiments.
>>>
>>> In the above paper, the aforementioned experiments are given in Kerr and
>>> Churchill arrow notation (where the arrow head points toward the RNA sample
>>> labelled with red dye and the sample at the arrow base is labelled green).
>>>
>>> The experiments can be summarised as follows:
>>>
>>> (a)
>>> Red     Green
>>> RNA1  RNA2
>>>
>>> For this experiment, it seems to me that only parameter of interest (let's
>>> call it mu1) is the response value (or mean of the response values if we
>>> have more than one identical replicate); because the response is estimated
>>> by the (mean of) the log2 fold change between red and green channels, in
>>> this instance, the design "matrix" is simply (1); this becomes a column of 1
>>> values if there is more than one identical replicate.
>>>
>>> (b)
>>> Red     Green
>>> RNA1  RNA2
>>> RNA2  RNA1
>>>
>>> In this experiment, although there are two arrays, similarly to in
>>> experiment (a), it seems that there is only one comparison of interest
>>> (namely, the difference between RNA1 and RNA2); because the dyes in the
>>> second array are inverted (relative to the first array in the experiment),
>>> the ratio, too, is inverted. Inverting the term inside the logarithm will
>>> yield a response which is the negative of the response from the first
>>> replicate (i.e. log2(RNA2/RNA1) = -log2(RNA1/RNA2)); therefore, the second
>>> replicate will yield the negative relative of the response from the first
>>> replicate. For consistency, we must multiply the response value by -1. As a
>>> result, we have the design matrix: (1, -1).
>>>
>>> I'm confused about how the design matrices are formed for experiments in
>>> (c) and (d).
>>>
>>> In (c), RNA1 and RNA2 are compared through a common reference.
>>>
>>> (c)
>>> Red:     Green:
>>> Ref      RNA1
>>> RNA1   Ref
>>> RNA2   Ref
>>>
>>> The design matrix is given by (-1 0; 1 0; 1 1) -- where ";" denotes the
>>> end of the matrix row; the first coefficient estimates the difference
>>> between the RNA1 and the reference sample, whilst the second coefficient
>>> estimates the the difference between RNA1 and RNA2.
>>
>>
>> It isn't easy to explain how this design matrix was derived, but it is easy
>> to confirm that it works.  Consider the third array for example, which
>> estimates RNA2-Ref (Red minus Green).  As you say, the first coef is
>>
>>   coef1 = RNA1-Ref
>>
>> and the second is
>>
>>   coef2 = RNA2-RNA1
>>
>> The third array estimates
>>
>>   RNA2-Ref = coef1 + coef2
>>
>> Hence the two coefficients have to be c(1,1).
>>
>> You can easily compute these design matrices in limma.  Here is the code for
>> Figure 1(c) in the paper:
>>
>> > targets
>>    Cy3 Cy5
>>  1   A Ref
>>  2 Ref   A
>>  3 Ref   B
>> > parameters
>>      AvsRef BvsA
>>  Ref     -1    0
>>  A        1   -1
>>  B        0    1
>> > modelMatrix(targets,parameters=parameters)
>>  Found unique target names:
>>   A B Ref
>>    AvsRef BvsA
>>  1     -1    0
>>  2      1    0
>>  3      1    1
>>
>> Best wishes
>> Gordon
>>
>>> Experiment (d) is a saturated direct design comparing three samples.
>>>
>>> (d)
>>> Red     Green
>>> B         A
>>> A        C
>>> C        B
>>>
>>> The design matrix is given by (1 0; 0 1; -1 -1); where the first
>>> coefficient compares the difference between B - A and the second coefficient
>>> compares the difference between C - B.
>>>
>>> Also, on page 39 of the Limma user guide
>>> (http://www.bioconductor.org/packages/release/bioc/vignettes/limma/inst/doc/usersguide.pdf),
>>> you can find a design and contrast matrix for a direct two-colour design.
>>> The experiment compares CD4, CD8 and DN. I'm not really sure how this
>>> design/contrast works.
>>>
>>> Explanation of the above structures would be greatly appreciated.
>>>
>>> Joseph
>>>
>>> -- output of sessionInfo():
>>>
>>> --
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}